Thursday, 2017-10-12

pabelanger	maybe it isn't fixed	00:00
pabelanger	ianw: mind adding http://logs.openstack.org/16/502316/3/check/gate-openstackci-beaker-ubuntu-trusty/241fbcf/console.html#_2017-10-11_23_55_43_441780 to your list of things to fix?	00:00
pabelanger	that is blocking system-config patches from landing	00:00
pabelanger	I'll look at mirror-update.o.o again	00:01
*** vhosakot has joined #openstack-infra		00:01
ianw	hmm, ok	00:02
jeblair	pabelanger: let me know what you see -- i don't understand what i'm seeing in /var/log/reprepro/ubuntu-mirror.log	00:02
pabelanger	jeblair: I deleted the lockfile and I'm manually running reprepro update on ubuntu mirror	00:03
*** yamahata has quit IRC		00:03
pabelanger	I _think_ we need to increase out timeout from 30mins to longer	00:04
jeblair	pabelanger: ok, that explains the abbreviated output	00:04
pabelanger	which then kills reprepro and leaves lockfile	00:04
*** dingyichen has joined #openstack-infra		00:04
*** srobert_ has joined #openstack-infra		00:04
jeblair	pabelanger: we still only release the volume if it's successful, right? (so i'm curious how we're getting out of syng)	00:04
jeblair	sync	00:04
pabelanger	but, a few hours ago, I did run reprepro check and checkpool fast, and things looked correct	00:04
pabelanger	jeblair: right, should only vos release when check / checkpool pass	00:05
pabelanger	processing updates for 'xenial-security\|main\|amd64'	00:05
pabelanger	currently	00:05
*** gmann_afk is now known as gmann		00:06
*** ijw has quit IRC		00:07
*** ijw has joined #openstack-infra		00:08
*** sree has joined #openstack-infra		00:08
*** srobert has quit IRC		00:08
*** Swami has quit IRC		00:08
ianw	pabelanger: the fact this is in keyring & cryptography ... possibly related to missing packages?	00:08
ianw	it does not trivially reproduce on a trusty node in a virtualenv	00:09
*** vhosakot has quit IRC		00:09
pabelanger	ianw: ya, I think I'm going to see about an autohold, once I figure out reprepro issue	00:10
ianw	let me add that and kick one off	00:10
*** sree has quit IRC		00:12
*** yamahata has joined #openstack-infra		00:12
*** sbezverk has joined #openstack-infra		00:16
*** edmondsw has joined #openstack-infra		00:16
ianw	pabelanger: http://paste.openstack.org/show/623394/	00:17
ianw	error in cryptography setup command: Invalid environment marker: python_version < '3'	00:17
*** Goneri has quit IRC		00:17
clarkb	oh swift was running into that too	00:17
notmyname	yup	00:18
notmyname	had to update setuptools to a newer-than-distro version	00:18
ianw	hmm, the root cause seems to be pip 7 ish	00:18
ianw	why is the latest pip not on trusty	00:18
ianw	to the build logs!	00:19
ianw	http://logs.openstack.org/16/502316/3/check/gate-openstackci-beaker-ubuntu-trusty/241fbcf/console.html#_2017-10-11_23_51_08_431588	00:20
ianw	sudo pip install 'pip<8' 'virtualenv<14'	00:20
ianw	why would you do that	00:20
*** edmondsw has quit IRC		00:21
ianw	because http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/jobs/macros.yaml#n569 ... hmmm	00:21
ianw	because of this https://review.openstack.org/#/c/270995/ ~2 year old patch	00:22
jeblair	does that mean that the ubuntu mirror is actually okay?	00:22
*** gouthamr has joined #openstack-infra		00:24
pabelanger	no, I think there is an issue with the reprepro database for xenial-security	00:26
*** vhosakot has joined #openstack-infra		00:27
jeblair	pabelanger: why's that?	00:27
jeblair	tell me what you're looking at and what you see	00:27
pabelanger	sure, 1 sec	00:28
pabelanger	let me get pastebin	00:28
openstackgerrit	Ian Wienand proposed openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs" https://review.openstack.org/511360	00:29
ianw	jeblair: remember what that's all about ^ ? i'm guessing no	00:29
*** jkilpatr has quit IRC		00:29
jeblair	ianw: nope, sorry.	00:30
openstackgerrit	Ian Wienand proposed openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs" https://review.openstack.org/511360	00:30
pabelanger	jeblair: http://paste.openstack.org/show/623396/	00:30
pabelanger	processing updates for 'u\|xenial-security\|main\|amd64'	00:30
pabelanger	doesn't look correct	00:30
*** caphrim007 has joined #openstack-infra		00:31
*** andreas_s has joined #openstack-infra		00:31
pabelanger	and reprepro update command appears to have hung on what I posted	00:31
pabelanger	I did use strace to look at pid, but I didn't see much going on	00:31
ianw	pabelanger: dmesg ... i had an issue with zero sized files on AFS the other day doing the ceph stuff	00:32
ianw	?	00:32
ianw	i had to clear everything out	00:32
jeblair	i'm stracing it now, and it does not seem to be working -- i haven't even gotten the current system call returned.	00:32
pabelanger	ianw: clear out form where?	00:32
pabelanger	from*	00:32
ianw	sorry, clear out the mirror and restart	00:33
ianw	but this was just for the ceph luminous, so not big	00:33
pabelanger	ah	00:33
pabelanger	yah, I am hope we don't need to do the same	00:33
mnaser	xenial-security is probably not that big, unless you have to wipe everything :(	00:34
pabelanger	right, I _think_ i've already cleared the files on xenial-security, but update still not happy	00:34
openstackgerrit	Ian Wienand proposed openstack-infra/openstack-zuul-jobs master: Remove pin pip from beaker legacy jobs https://review.openstack.org/511361	00:34
pabelanger	which make me think, when timeout killed reprepro before, something maybe got corrupt in the database	00:35
pabelanger	which, is possible, according to the warning it prints	00:35
jeblair	pabelanger: so let's assume there's something unhappy about the afs client on mirror-update that has it stuck. how is it possible that we released an inconsistent volume?	00:35
*** andreas_s has quit IRC		00:35
ianw	jeblair: of course, if we merge that unpin ... then do we break things even worse? :/	00:35
*** Apoorva_ has joined #openstack-infra		00:36
jeblair	ianw: zuulv3 would tell us pre-merge :/	00:36
pabelanger	jeblair: I don't think we did. I mean, I manually release a few hours ago, because I thought things were okay. But I believe the actually issue is, we build newer images with new packages, but our indexes were still old	00:36
jeblair	pabelanger: ooh... so the underlying fix we need is to use our own mirrors when building images?	00:37
pabelanger	and when we run apt-get install on bindep-fallback, it fails because we still point to old packages, while it expects newer	00:37
pabelanger	jeblair: Yah, that is possible	00:37
pabelanger	it would help prevent something like this I think	00:37
jeblair	pabelanger: what do you main by 'point to old packages'?	00:37
jeblair	s/main/mean/	00:37
pabelanger	I am speculating here, but give me a second	00:38
*** srobert_ has quit IRC		00:38
pabelanger	jeblair: http://logs.openstack.org/c7/c722a78bea5d1a75cb204cc783b2480131bd5bc4/post/static-election-publish/d11a220/console.html#_2017-10-11_01_54_31_755588	00:38
jeblair	pabelanger: do you mean the index on the image is out of date? cause i thought the first thing we do after configure-mirrors is to apt-get update. the only thing i could see that could cause an inconsistency is to actually have a package installed on the image that causes a conflict	00:38
pabelanger	that error to me, means we already have libcurl4-gnutls-dev installed, but it is a newer version that what our index is saying it should be	00:39
*** Apoorva has quit IRC		00:39
pabelanger	I think the indexes on the image are newer then AFS mirrors, but because we apt-get clean in configure_mirror today, they image boots properly	00:40
pabelanger	then, once we hit old indexes, apt-get is confused by old indexes	00:40
jeblair	apt-get clears out the index; clean should just clear out cached packages, i think.	00:40
*** Apoorva_ has quit IRC		00:40
jeblair	i looked on a ready xenial node and see i libcurl3-gnutls:amd64 7.47.0-1ubuntu2.2 installed, no libcurl4	00:41
pabelanger	the odd thing is, I _think_ this might work on zuulv3 jobs. I say think because I thought I seen a job properly pass an hour ago	00:41
* EmilienM online for the next hour if needed		00:41
pabelanger	okay, it is possible I am wrong. So, please look and see if you find anything	00:42
EmilienM	pabelanger: see #tripleo when you can	00:43
jeblair	pabelanger: is that the current repo error, or the previous one?	00:45
*** thorst has joined #openstack-infra		00:46
*** thorst has quit IRC		00:46
pabelanger	jeblair: I believe that has been the issue all along, clarkb right?	00:46
jeblair	i just ran those commands on the xenial node i logged into and they worked	00:47
pabelanger	ya	00:47
pabelanger	http://logs.openstack.org/55/511255/1/check/legacy-devstack-gate-tox-py3-run-tests/8134612/job-output.txt.gz#_2017-10-11_15_52_26_842050	00:47
jeblair	pabelanger: those are both old runs though, are we sure that's still a problem?	00:48
pabelanger	jeblair: which cloud?	00:48
jeblair	pabelanger: rax-ord	00:48
pabelanger	jeblair: no, I'm not 100% it is an issue still.	00:49
pabelanger	I thought I fixed it a few hours ago	00:49
pabelanger	but, when I started running reprepro update manually, and it stopped (hung), I assumed it still wasn't fixed	00:49
pabelanger	so, possible this is a 2nd (new isssue)	00:50
pabelanger	issue*	00:50
jeblair	okay here's the recent error: http://logs.openstack.org/56/511356/1/check/gate-election-python35/e52bb05/console.html	00:50
*** rook is now known as rook-afk		00:50
jeblair	https://etherpad.openstack.org/p/fkQc9nXfgN	00:51
pabelanger	that is also ovh	00:51
mnaser	jeblair i think what hsppens is 2.2 is installed, but then if you try and do apt-get install libcurl3-gnutls-devel, it will try to pull 2.3	00:51
*** s-shiono has joined #openstack-infra		00:51
mnaser	beacuse it tries to install libcurl3-gnutls-devel-<whatever>-2.3	00:51
mnaser	and that wants libcurl3-gnutls-<foo>-2.3 which does not exist in the mirrors	00:52
*** LindaWang has joined #openstack-infra		00:52
mnaser	(or didn't this morning at least)	00:52
clarkb	the gnutls one is the ine weve had all along	00:52
jeblair	mnaser: right, though i just ran those commands on a 30m old rax-ord node and it only wanted to install 2.2	00:52
mnaser	apt-get update before doing that jeblair ?	00:52
jeblair	so the question i now have is: under what circumstances does it want to install 2.3	00:53
jeblair	mnaser: yes	00:53
jeblair	pabelanger seems to be suggesting we should look at the cloud region as a nexus	00:53
tonyb	pabelanger: I'm still gettingt the gnutls issue :(	00:53
mnaser	you're bringing up a good point here, the volume would have not been released	00:53
mnaser	tonyb do you have logs of a failed job?	00:53
pabelanger	yah, we are debugging now	00:54
jeblair	mnaser, tonyb: i started an etherpad and put tonyb's links there: https://etherpad.openstack.org/p/fkQc9nXfgN	00:54
tonyb	mnaser: reykjavik	00:54
jeblair	both ovh-gra1	00:54
*** lewo` has quit IRC		00:54
tonyb	mnaser: http://logs.openstack.org/56/511356/1/check/gate-election-python27-ubuntu-xenial/695ae09/console.html#_2017-10-12_00_09_31_881052 stupid clipboard	00:54
mnaser	tonyb np :)	00:55
mnaser	jeblair ill try and search elasticsearch and see if the theory of ovh-gra1 only holds	00:55
tonyb	jeblair: Thanks.	00:55
jeblair	mnaser: cool, i'll try see what i can find out about those node and image build times	00:55
*** sbezverk has quit IRC		00:55
jeblair	we put the image build information on the node. we don't output it in all jobs. :(	00:56
*** dhinesh has quit IRC		00:56
mnaser	jeblair do you build the image multiple times for different formats in nodepool	00:57
mnaser	or build once then convert	00:57
pabelanger	Oh	00:57
clarkb	mnaser: once and convert	00:57
jeblair	mnaser: once then convert	00:57
pabelanger	jeblair: I think rax are using the old images	00:57
jeblair	pabelanger: that's not surprising	00:58
pabelanger	jeblair: I can see in nodepool we are still trying to upload xenial images	00:58
jeblair	pabelanger: so the cloud connection is "broken everywhere but rax"	00:58
pabelanger	if so, they were the last good images before the breakage	00:58
jeblair	i'm going to assume that's the case for the moment and stop my investigations	00:58
dmsimard	Wow, achievement unlocked. ARA mentioned in top comment of a frontpage HackerNews thread (without getting thrashed) https://news.ycombinator.com/item?id=15450594	00:58
pabelanger	kk	00:59
*** priteau has joined #openstack-infra		00:59
jeblair	dmsimard: congrats!	00:59
*** cuongnv has joined #openstack-infra		00:59
jeblair	i'm going to see if i can get myself on a non-rax node	00:59
mnaser	elasticsearch isn't cooperating, it shows bars for events but the messages is not showing things :<	00:59
mnaser	or at least it's taking a loooong time to load	01:00
*** namnh has joined #openstack-infra		01:00
clarkb	mnaser: ya I noticed e-r is out to lunch too will have to investigate in the morning	01:00
jeblair	ii curl 7.47.0-1ubuntu2.3 amd64 command line tool for transferring data with URL syntax	01:00
mnaser	there we have it	01:00
dmsimard	Sorry for distracting from the issues, I'll go back to my cave	01:00
* tonyb is going to take a tangent and try to create a minimal bindep.txt for the election repo		01:01
jeblair	okay, so it does look like the problem is that images are newer than mirrors	01:01
*** jcoufal has joined #openstack-infra		01:01
jeblair	tonyb: time well spent regardless!	01:01
pabelanger	so, when I checked a few hours ago, I must have been looking at a rax node	01:01
tonyb	jeblair: Yeah I've been putting it of as 'hard'	01:01
jeblair	so fixes are: short-term: git a mirror update finished and released. long-term: build images with our mirrors	01:01
mnaser	jeblair: compounded alongside the mirrors failing to update, boo	01:01
fungi	tonyb: not hard at all. adding a bindep.txt is self-testing	01:02
tonyb	fungi: hehe okay	01:02
*** aeng has quit IRC		01:02
pabelanger	Yah, and reprepro update still hasn't moved past the pastbin from above, so I am guessing we're corrupted something with timeout command	01:03
jeblair	pabelanger: if you think afs is being weird, how about we reboot mirror-update?	01:03
fungi	tonyb: just take the http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/data/bindep-fallback.txt and whittle it down to the things you think your jobs for that repo will need from a distro package perspective. odds are, on the election repo, the answer is "very little"	01:03
pabelanger	jeblair: ya, happy to try that	01:03
jeblair	pabelanger: the main process is still stuck and doing nothing	01:03
jeblair	i'm less inclined to think it's corruption and more inclined to think it's afs	01:03
*** priteau has quit IRC		01:04
pabelanger	sure, lets reboot	01:04
*** jcoufal_ has joined #openstack-infra		01:04
tonyb	fungi: Yeah. I'm going to try with an empty one ;P	01:04
jeblair	the only other thing i see running is npm-mirror-update running since apr 14	01:04
jeblair	i think i should just issue 'reboot' now. any objections?	01:05
pabelanger	++	01:05
jeblair	and there goes bandersnatch.	01:05
jeblair	i'll wait till it's done, then reboot immediately.	01:05
*** liusheng has joined #openstack-infra		01:05
pabelanger	dmsimard: I agree with comment, tower has a lot of moving parts	01:05
jeblair	i'm not going to cast any stones at CI/CD systems for having lots of moving parts.	01:06
jeblair	rebooting	01:06
SamYaple	oh you just found the issue...	01:06
*** kiennt26 has joined #openstack-infra		01:06
SamYaple	i was going to pop on to say ovh is a valid mirror	01:06
SamYaple	it just looks like you have newer packages than ovh has already installed	01:07
SamYaple	always the slow poke	01:07
jeblair	SamYaple: yep!	01:07
jeblair	SamYaple: all mirrors are old	01:07
pabelanger	mirror-update.o.o back	01:07
*** jcoufal has quit IRC		01:07
jeblair	all images are new, except rax. so rax is the only thing working now (because we're unable to upload there atm)	01:07
jeblair	pabelanger: you want to do the rerpreprepepro thing?	01:07
pabelanger	jeblair: yah	01:08
*** sbezverk has joined #openstack-infra		01:08
SamYaple	well my gates are working too, but thats because i build everything in docker conatienrs	01:08
SamYaple	thats what got me looking down the versions to new path	01:08
jeblair	the systemic fix is to build our images with our mirrors so they can't get ahead of each other	01:08
SamYaple	yea	01:09
*** namnh has quit IRC		01:09
dmsimard	SamYaple: I know I wanted to ask you something earlier but I forget what :/	01:09
mnaser	question	01:09
mnaser	dont we want to fix the upload-to-rax problem first	01:10
pabelanger	okay, reprepro running now	01:10
jeblair	pabelanger: that is a lot of 'v's :)	01:10
pabelanger	moar v's	01:10
mnaser	or otherwise we'll have a significantly smaller portion of ci that is functioning	01:10
pabelanger	reading '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_xenial-security_main_amd64_Packages'	01:10
pabelanger	last thing in console ATM	01:10
SamYaple	jeblair: another option would be to jsut run apt-get with the option "-t=xenial" as that will stomp and downgrade thigns as needed	01:10
*** hemna_ has quit IRC		01:10
*** yamahata has quit IRC		01:10
SamYaple	that might cause other problems though, something to keep in mind	01:11
jeblair	mnaser: we generally expect images to be out of date -- we try not to rely on them being current	01:11
SamYaple	it comes in handy in a pinch	01:11
mnaser	jeblair gotcha, and actually i realized that curl will update gracefully if the mirrors are okay now	01:11
SamYaple	dmsimard: was it "SamYaple: how are you so successful and attractive?"	01:11
mnaser	(of course i always realize these things after speaking up)	01:11
pabelanger	file looks valid	01:11
tonyb	https://review.openstack.org/#/c/511365/ \o/ Possibly more than needed and wont work for rpms but I'll merge it anyway	01:12
pabelanger	jeblair: are you seeing anything in strace?	01:12
jeblair	pread(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 90521600) = 4096	01:12
jeblair	that's the last line	01:13
jeblair	reprepro 2003 root 5u REG 0,25 90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db	01:13
dmsimard	SamYaple: nope. And it's annoying the hell out of me now :(	01:13
*** liusheng has quit IRC		01:13
jeblair	i can access that file okay	01:13
jeblair	pabelanger: any chance that's a potentially corrupted file?	01:13
SamYaple	dmsimard: sounds like me yea	01:14
jeblair	pabelanger: reprepro is at 100% cpu	01:14
pabelanger	jeblair: possible, however I believe we can regnerate it with reprepro with another command	01:14
pabelanger	let me check man page	01:14
mnaser	btw, not sure if anyone knows this or not but -fF with strace is quite useful	01:14
mnaser	it'll actually hop into subprocesses/threads	01:14
*** cuongnv has quit IRC		01:14
jeblair	pabelanger: i feel like 100% cpu and no system call activity after reading a bunch of null data from a file looks a lot like "infinite loop because of bad data"	01:14
jeblair	mnaser: i used -f but not -ff	01:15
mnaser	jeblair im old, man says => "This option is now obsolete and it has the same functionality as -f."	01:15
mnaser	old typing habits die hard i guess	01:15
*** liusheng has joined #openstack-infra		01:15
pabelanger	reprepro collectnewchecksums	01:15
pabelanger	I think that is the command	01:15
pabelanger	jeblair: yah, seems to make sense	01:15
ianw	pabelanger / jeblair : can we do https://review.openstack.org/#/c/511360/ to unblock system-config, and i'll jump on any further issues?	01:15
pabelanger	reprepro _listchecksums should show what current checksums are	01:16
pabelanger	jeblair: I'm going to kill reprepro and try _listchecksums	01:16
jeblair	pabelanger: ++	01:17
jeblair	ianw: +2	01:17
dmsimard	SamYaple: OH I remember now	01:18
*** baoli has joined #openstack-infra		01:18
dmsimard	SamYaple: remember how we discussed bindep supporting different dances for sources and things	01:18
pabelanger	appears to be running, will let it finish	01:18
jeblair	ianw, pabelanger: i need to afk. aiui next steps are 1) fix pip stuff 2) fix reprepro and release the mirror 3) approve and enqueue 511260 into zuulv2 gate	01:18
dmsimard	SamYaple: or different profiles and such	01:19
jeblair	i think if we do all of those things, we can zuulv3?	01:19
SamYaple	dmsimard: https://review.openstack.org/#/c/506502/ ?	01:19
pabelanger	jeblair: okay, I'll keep working on reprepro	01:19
SamYaple	(please some one +3 that patch, im begging)	01:19
dmsimard	SamYaple: I was wondering whether that was still relevant with zuul v3, considering roles (and their dependencies) should likely be self contained	01:19
dmsimard	so if you need something in a role, it should likely be installed inside that role	01:19
SamYaple	dmsimard: so actually that patch is to use bindep in docker containers for image building (which we are currently doing in LOCI)	01:20
SamYaple	significantly different use case to the gate	01:20
dmsimard	SamYaple: oh, huh, interesting.	01:20
SamYaple	dmsimard: https://github.com/openstack/loci/blob/master/bindep.txt	01:20
dmsimard	SamYaple: sort of makes sense I guess	01:21
SamYaple	dmsimard: it makes image building very very clean. if i can get the bindep syntax changed from the above patch, then i can do https://review.openstack.org/#/c/506823/3/bindep.txt	01:21
dmsimard	I wouldn't have thought about bindep for installing packages in containers :p	01:21
SamYaple	which is even more better	01:21
SamYaple	well its great ebcause its one stop for all rpm/deb/pacman/emerge	01:21
SamYaple	and with different architectures	01:22
dmsimard	not unlike ansible but I guess ansible is more verbose	01:22
ianw	jeblair: ok, i'll try to push all that along	01:22
SamYaple	no duplication in the case where there are same-named packages across multiple distros	01:22
dmsimard	I should look at ansible-container again, I try to look at least once every 3 months	01:22
SamYaple	all in all, weve been very happy with it	01:22
dmsimard	they haven't yet fulfilled my dream	01:22
SamYaple	heh you and me both	01:22
ianw	the logs i assume we're just pruning as fast as we can	01:23
dmsimard	SamYaple: https://github.com/ansible/ansible-container/issues/399#issuecomment-316109193	01:23
*** mrunge has quit IRC		01:24
SamYaple	dmsimard: i actually dont mind running systemd as pid 1. docker did/does have the pid reaping problem (there are docker daemon options to fix that now)	01:24
SamYaple	but i dont want to include systemd in my image because it adds like 80mb	01:24
SamYaple	im building keystone in less than a 40mb layer. i dont need to triple that	01:25
dmsimard	SamYaple: oh, it's just a bit awkward to do it but that's not ansible-container's fault	01:25
*** cuongnv has joined #openstack-infra		01:25
openstackgerrit	Tony Breeds proposed openstack-infra/irc-meetings master: bindep: Supply a bindep.txt file to avoid the 'global' set https://review.openstack.org/511369	01:25
SamYaple	its a cool idea, i just find it hard to image it becoming practicle	01:25
*** yamamoto has quit IRC		01:26
dmsimard	SamYaple: the use case is mostly to take a role that already exists and works with modern distros and use it to build an image with ansible-container	01:26
dmsimard	but there's all sort of things that make this awkward	01:26
SamYaple	yes. and that i agree with as a migration step	01:26
SamYaple	but i dont really like it as a "long-term" solution	01:27
dmsimard	migration ? there's no migration, if you want to install on a bare metal, vm or container image you use the same role with the same params and everything :p	01:27
SamYaple	no i get that, im just not sure i get the benefit at that point is my point	01:27
*** baoli has quit IRC		01:27
fungi	s/migration/cross-platform portability/ ?	01:27
dmsimard	fungi: :)	01:28
dmsimard	If I do a service task in ansible that says start the service, it better be able to start the darn service :)	01:28
*** kiennt26 has quit IRC		01:29
SamYaple	i do understand the feeling :)	01:29
dmsimard	in the meantime, I'll keep cursing at this elk container thing	01:31
openstackgerrit	Merged openstack-infra/irc-meetings master: bindep: Supply a bindep.txt file to avoid the 'global' set https://review.openstack.org/511369	01:32
pabelanger	ianw: 260GB is the since of ubuntu mirror	01:33
pabelanger	would take a bit to re-mirror i think	01:34
*** vhosakot has quit IRC		01:38
*** fanzhang has joined #openstack-infra		01:38
*** baoli has joined #openstack-infra		01:38
*** kiennt26 has joined #openstack-infra		01:41
*** ijw has quit IRC		01:41
*** ijw has joined #openstack-infra		01:41
ianw	:/	01:42
pabelanger	currently running reprepro export xenial	01:42
pabelanger	in an effort to see if we regenerate	01:42
*** larainema has joined #openstack-infra		01:43
*** kiennt26 has quit IRC		01:43
*** baoli has quit IRC		01:43
*** kiennt26 has joined #openstack-infra		01:43
*** ijw has quit IRC		01:44
*** sdague has quit IRC		01:44
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add documentation on force-merging a change https://review.openstack.org/511248	01:45
*** jcoufal_ has quit IRC		01:45
*** kiennt26 has quit IRC		01:46
*** kiennt26 has joined #openstack-infra		01:46
*** kiennt26 has quit IRC		01:47
*** kaisers has quit IRC		01:48
*** kaisers has joined #openstack-infra		01:49
*** psachin has joined #openstack-infra		01:49
*** kiennt26 has joined #openstack-infra		01:49
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add documentation on force-merging a change https://review.openstack.org/511248	01:50
*** edmondsw has joined #openstack-infra		01:51
*** rosmaita has quit IRC		01:53
*** nikhil has quit IRC		01:54
*** hongbin has joined #openstack-infra		01:55
openstackgerrit	Merged openstack-infra/project-config master: Online inap-mtl01 region https://review.openstack.org/511328	01:57
*** kukacz has quit IRC		02:00
*** dhinesh has joined #openstack-infra		02:00
*** kukacz has joined #openstack-infra		02:01
openstackgerrit	Merged openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs" https://review.openstack.org/511360	02:02
*** thorst has joined #openstack-infra		02:02
ianw	ok, will try system-config in a bit with ^ and see where we're at	02:03
*** thorst has quit IRC		02:07
*** baoli has joined #openstack-infra		02:07
*** jascott1 has quit IRC		02:08
*** jascott1 has joined #openstack-infra		02:08
*** hichihara has joined #openstack-infra		02:10
openstackgerrit	Merged openstack-infra/irc-meetings master: Create alternate time for Neutron Drivers meeting https://review.openstack.org/511293	02:11
*** baoli has quit IRC		02:12
*** jascott1 has quit IRC		02:12
*** kiennt26 has quit IRC		02:13
*** kiennt26 has joined #openstack-infra		02:15
*** thorst has joined #openstack-infra		02:17
dmsimard	clarkb: I'm done fighting with my local elk instance for tonight, I wanted to test the type things we've talked about.. I'll look at it some more tomorrow.	02:18
*** thorst has quit IRC		02:19
dmsimard	It might be that I'm testing with stuff that is too up to date compared to what we're running on logstash.o.o.	02:20
clarkb	oh ya we are old for reasons	02:20
clarkb	mostly of the javascript variety	02:21
*** gildub has quit IRC		02:25
*** baoli has joined #openstack-infra		02:25
openstackgerrit	Merged openstack-infra/irc-meetings master: Update Neutron team meeting chairperson https://review.openstack.org/511303	02:28
pabelanger	clarkb: still working on getting ubuntu mirror working again, it is possible we may need to rebuilt it from scratch, but doing so might take ~2 days. We have 260GB to deal with	02:28
pabelanger	I don't plan on deleting anything, but something we might need to dig into in the morning	02:28
ianw	can i help?	02:29
clarkb	pabelanger: do we think it is reprepro or afs or both?	02:29
pabelanger	ianw: right now, waiting for reprepro export to finish, then going to try running update again	02:29
pabelanger	ianw: however, I might have to pass off to you shortly, getting late	02:29
pabelanger	clarkb: we rebooted mirror-update to make sure afs was good	02:29
pabelanger	but same issues	02:30
pabelanger	current hope is export regenerates everything we need	02:30
pabelanger	so update cmd works	02:30
pabelanger	its been reading files on disk for a while now	02:30
clarkb	export of reprepro?	02:30
pabelanger	https://mirrorer.alioth.debian.org/reprepro.1.html	02:31
pabelanger	yah	02:31
*** liujiong has joined #openstack-infra		02:31
pabelanger	reprepro export xenial	02:31
*** gouthamr has quit IRC		02:31
clarkb	gotcha thats like an in place rebuild	02:32
pabelanger	yah, I hope	02:33
*** coolsvap has joined #openstack-infra		02:33
pabelanger	I also just found https://github.com/esc/reprepro/blob/master/docs/recovery	02:34
*** srobert has joined #openstack-infra		02:38
*** markvoelker has quit IRC		02:39
ianw	the only other thing i can think to do as a prophylactic is maybe attach a volume and get the data on afs01 so if it needs to be imported, it's there?	02:39
ianw	but if it slows things down, it would be even worse	02:39
*** mrunge has joined #openstack-infra		02:39
*** andreas_s has joined #openstack-infra		02:40
*** gildub has joined #openstack-infra		02:42
*** srobert has quit IRC		02:43
*** gcb has joined #openstack-infra		02:43
*** dfflanders has joined #openstack-infra		02:44
pabelanger	Oh interesting	02:48
pabelanger	http://paste.openstack.org/show/623401/	02:48
pabelanger	I just got that on export	02:48
pabelanger	and see some afs warnings in dmesg	02:49
*** andreas_s has quit IRC		02:49
*** dfflanders has quit IRC		02:49
*** yamahata has joined #openstack-infra		02:51
ianw	oh dear	02:52
*** junbo has quit IRC		02:52
*** edmondsw has quit IRC		02:53
ianw	things that handle an error from close() are few and far between too	02:53
*** nicolasbock has quit IRC		02:53
ianw	handle it properly, anyway	02:53
pabelanger	I'm starting down the recovery doc now	02:54
pabelanger	rereference first	02:54
tonyb	project-config is still frozen correct?	02:57
mnaser	tonyb v2 changes likely wont merge, v3 changes welcome afaik	02:57
tonyb	mnaser: okay I was hoping to do both but I can just make a note and do the v3 change after the switch	02:58
*** andreas_s has joined #openstack-infra		02:59
mnaser	tonyb you can still propose the v3 change and it will be reviewed and merged	02:59
pabelanger	ianw: ya, something appears to be up with AFS	02:59
*** priteau has joined #openstack-infra		03:00
ianw	i've been pining afs01 from mirror-update	03:00
ianw	no dropped packets, a few quite high spikes though (~8ms)	03:00
tonyb	mnaser: okay, I'll thin kon that for a bit	03:00
ianw	seeing as basically nothing has changed, gotta feel like it's network between the two	03:00
pabelanger	ianw: okay, I have to call it. But xenial-updates and xenial-security both have issues	03:01
pabelanger	xenial and xenial-backports update properly	03:01
pabelanger	so, it is possible could just try first deleteing ubuntu-secutiry from reprepro, then mirror it	03:01
pabelanger	then, if it works, we do the same for -updates	03:02
*** mtreinish has quit IRC		03:02
pabelanger	ianw: good luck, I'll read up on backscroll in the morning	03:02
ianw	alright, let me think about it before i do anything :)	03:03
*** hichihara has quit IRC		03:03
*** priteau has quit IRC		03:05
*** mtreinish has joined #openstack-infra		03:07
ianw	interesting, we don't make /var/run/reprepro on reboot i guess	03:10
*** kiennt26 has quit IRC		03:11
*** gouthamr has joined #openstack-infra		03:12
*** andreas_s has quit IRC		03:12
*** thorst has joined #openstack-infra		03:12
*** thorst has quit IRC		03:12
*** kiennt26 has joined #openstack-infra		03:16
*** baoli has quit IRC		03:17
*** Srinivas has joined #openstack-infra		03:20
Srinivas	hi all, i am facing this while runnning jobs in jenkins, " ERROR! Unexpected Exception: 'module' object has no attribute '_vendor'" any one knows this issue	03:20
*** links has joined #openstack-infra		03:25
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Create flock directories in /var/run https://review.openstack.org/511380	03:28
*** udesale has joined #openstack-infra		03:35
*** yamamoto has joined #openstack-infra		03:39
*** yamamoto_ has joined #openstack-infra		03:44
*** yamamoto has quit IRC		03:47
openstackgerrit	Nam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy job from Oslo.log https://review.openstack.org/511384	03:52
*** dave-mccowan has quit IRC		03:54
*** ykarel\|afk has joined #openstack-infra		03:56
openstackgerrit	Nam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Oslo.log legacy job https://review.openstack.org/511385	03:56
openstackgerrit	Nam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Oslo.log legacy job https://review.openstack.org/511385	03:58
openstackgerrit	Nam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy job from Oslo.log https://review.openstack.org/511384	04:01
*** sree has joined #openstack-infra		04:03
*** yamamoto_ has quit IRC		04:03
*** hongbin has quit IRC		04:04
ianw	pabelanger: ok, as suggested, i removed xenial-update & xenial-security -- i dropped them from distribtuions (/etc/reprepro/ubuntu/distributions.ianw) $REPREPRO --delete clearvanished	04:04
ianw	it seemed to remove a bunch of things	04:04
ianw	see logs in /tmp/ianw/out.log (sorry it's just a huge stream)	04:04
ianw	i put them back, and am rerunning a "normal" update	04:04
SamYaple	ianw: are you saying that updates and security wont be mirrored anymore?	04:04
SamYaple	oh ok	04:05
SamYaple	phew. scared me for a second	04:05
ianw	i don't know what it's doing, it's sitting there at 100% cpu with -VVV not saying anything	04:05
ianw	i am going to go do something else for about 45 minutes and not look at it, see if something happens	04:05
SamYaple	ianw: you dont htink this has to do with the docker mirror we added do you?	04:05
SamYaple	seems like alot of this started right after that	04:06
ianw	SamYaple: no, my guess is that transient network errors have introduced AFS issues, which have corrupted reprepro's state somehow	04:06
SamYaple	got it	04:06
ianw	the only thing more obscure than AFS internals is reprepro internals, which makes for an interesting combo	04:06
SamYaple	:)	04:07
SamYaple	i really have to finish my apt mirroring utility	04:07
SamYaple	ive never found a really good one	04:07
SamYaple	and i like to push to my ceph radosgw without having an intermedate clone locally, which nothing does	04:07
*** armax has quit IRC		04:08
*** armax has joined #openstack-infra		04:08
*** armax has quit IRC		04:08
*** armax has joined #openstack-infra		04:09
SamYaple	is there something on paper about how infra is going to solve the unsigned mirrors for apt issue?	04:09
*** armax has quit IRC		04:09
SamYaple	we could just resign the Release file after the mirroring	04:09
*** armax has joined #openstack-infra		04:10
ianw	i don't think there's anything to solve, i don't think we want it signed to avoid it being used as public mirrors	04:10
*** armax has quit IRC		04:10
ianw	as jeblair noted, you can't seem to strace this process. or at least it doesn't seem to be doing anything	04:10
ianw	i installed gdb in a hail-mary to see if i can see what's going on	04:10
Srinivas	SamYaple:hi all, i am facing this while runnning jobs in jenkins, " ERROR! Unexpected Exception: 'module' object has no attribute '_vendor'" any one knows this issue	04:10
ianw	i haven't bothered with symbols -> http://paste.openstack.org/show/623403/	04:11
ianw	it's somewhere doing something in db code every time	04:11
ianw	the dbs it has open are	04:12
ianw	reprepro 17829 root 5u REG 0,25 42790912 2537578 /afs/.openstack.org/mirror/ubuntu/db/references.db	04:12
ianw	reprepro 17829 root 6u REG 0,25 90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db	04:12
ianw	reprepro 17829 root 7u REG 0,25 485736448 2537576 /afs/.openstack.org/mirror/ubuntu/db/contents.cache.db	04:12
ianw	i think if they are corrupt, we are SOL basically	04:12
clarkb	ianw: I'm just happy that hail mary as long shot term transcemds murican football	04:13
SamYaple	would it be so wrong to purge it all and completely resync?	04:13
ianw	this might have been a hospital pass from pabelanger :)	04:13
SamYaple	i know it will take time	04:13
ianw	not sure if that term transcends	04:13
ianw	it's 250something gb over afs ... that is our last option	04:14
ianw	if i had to learn something from this now, it's that i think we should get things pointing to reverse proxies	04:15
*** claudiub\|2 has joined #openstack-infra		04:15
ianw	that way, we can at least roll out a config to point it to upstream if this happens again	04:15
SamYaple	reverse proxies work great for non-https things	04:15
SamYaple	but some repos are only https	04:15
clarkb	you can totally http -> https	04:16
SamYaple	yea thats true	04:16
SamYaple	i guess that wouldbt be so bad, we are already doing custom urls. wouldnt be much different	04:16
clarkb	ianw we could do that if we need to	04:16
SamYaple	as far as a workflow goes i mean	04:17
clarkb	in this case can we get by if we rebuild images against the mirror while we rebuild it?	04:19
ianw	clarkb: i think reverse proxies would be more reliable	04:20
*** yamamoto has joined #openstack-infra		04:21
ianw	of course right now, we can't merge system-config until https://review.openstack.org/511360 is deployed. i don't know why zuul hasn't reconfigured, it seems like it's been ages	04:22
SamYaple	i do apt-cacher-ng at my house with pretty good success	04:22
SamYaple	i would be ok with reverse proxies	04:22
SamYaple	it would save a great deal of space too	04:23
*** markvoelker has joined #openstack-infra		04:39
*** edmondsw has joined #openstack-infra		04:39
*** edmondsw has quit IRC		04:44
adriant	we still having issues with Zuul? I've got a patch I +2 and +1 workflow for an it doesn't seem to want to merge :(	04:45
adriant	https://review.openstack.org/#/c/509016/	04:45
*** bhavik1 has joined #openstack-infra		04:49
clarkb	adriant: it needs to be +1'd by jenkins first. a recheck should get it going	04:50
adriant	clarkb: ty!	04:50
adriant	clarkb: although I'd have assumed the zuul +1 was enough :(	04:51
*** stakeda has joined #openstack-infra		04:53
ykarel\|afk	clarkb, why jenkins has not possed +1 in https://review.openstack.org/#/c/510735/, any idea?	04:55
clarkb	it wouldve been if we managed to keep using zuulv3 for gating but we had to roll back	04:55
*** gouthamr has quit IRC		04:56
*** thorst has joined #openstack-infra		04:56
ykarel\|afk	the patch has workflow +1 but gate jobs are not running	04:56
clarkb	ykarel\|afk: it did, look at the comments (toggle ci if you need to)	04:56
*** ykarel\|afk is now known as ykarel		04:56
ykarel	clarkb, yes it's there but gate jobs are not there in http://status.openstack.org/zuul/	04:58
ykarel	not running	04:58
clarkb	it is there when I look	04:59
ykarel	clarkb, yes it's there, sorry	05:00
ykarel	i mislooked	05:00
*** thorst has quit IRC		05:00
*** priteau has joined #openstack-infra		05:00
*** bhavik1 has quit IRC		05:01
*** eumel8 has joined #openstack-infra		05:03
*** priteau has quit IRC		05:05
ykarel	clarkb, how tarballs are pushed, is there some issue, i cannot find https://tarballs.openstack.org/puppet-tripleo/puppet-tripleo-5.6.4.tar.gz	05:05
ianw	ok, i think reprepo is dead, nothing has happened	05:07
ykarel	looks like there is some issue that's why some reverts are going: https://review.openstack.org/#/q/status:merged+project:openstack/releases+branch:master+topic:newton/tripleo	05:08
*** CHIPPY has joined #openstack-infra		05:11
*** markvoelker has quit IRC		05:14
*** stakeda has quit IRC		05:18
*** ykarel_ has joined #openstack-infra		05:20
*** ykarel has quit IRC		05:22
ianw	i sent out a note in reply to mordred. i'm running out of ideas if i can't get system-config changes merged	05:29
*** jtomasek has joined #openstack-infra		05:33
*** bhavik1 has joined #openstack-infra		05:35
*** CHIPPY has quit IRC		05:36
*** mrunge has quit IRC		05:44
*** eumel8 has quit IRC		05:45
*** cshastri has joined #openstack-infra		05:45
*** threestrands has quit IRC		05:48
*** dhajare has joined #openstack-infra		05:53
*** e0ne has joined #openstack-infra		05:55
*** eumel8 has joined #openstack-infra		05:55
*** lewo has joined #openstack-infra		05:56
*** e0ne has quit IRC		05:59
*** udesale__ has joined #openstack-infra		06:03
*** martinkopec has joined #openstack-infra		06:03
*** udesale has quit IRC		06:03
*** udesale has joined #openstack-infra		06:06
*** mrunge has joined #openstack-infra		06:06
*** sshnaidm\|off is now known as sshnaidm		06:06
*** udesale__ has quit IRC		06:07
*** martinkopec has quit IRC		06:08
*** martinkopec has joined #openstack-infra		06:09
*** markvoelker has joined #openstack-infra		06:11
*** kjackal_ has joined #openstack-infra		06:12
*** bhavik1 has quit IRC		06:16
*** pahuang has quit IRC		06:18
*** yamahata has quit IRC		06:20
ianw	ok, stracing reprepro the last entry is	06:23
ianw	3170 pread(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 90521600) = 4096	06:23
ianw	which lsof tells me	06:23
ianw	reprepro 3170 root 6u REG 0,25 90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db	06:23
sshnaidm	infra-root, zuulv3 can't install ansible properly and fails, is it known issue? I didn't see it in etherpad before: fatal error: openssl/opensslv.h: No such file or directory: http://logs.openstack.org/07/472607/102/check/legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-puppet/a263fa3/job-output.txt.gz#_2017-10-12_06_13_19_532007	06:24
*** pahuang has joined #openstack-infra		06:27
*** edmondsw has joined #openstack-infra		06:28
ianw	i'm running "find pool -type f -print \| reprepro --confdir /etc/reprepro/ubuntu -b . _detect" which can hopefully recreate it?	06:29
*** pgadiya has joined #openstack-infra		06:30
*** Swami has joined #openstack-infra		06:31
*** edmondsw has quit IRC		06:32
ianw	oh jeez, if this has to checksum the whole thing, over afs ...	06:32
ianw	it's up to 1mb, the old file was 80mb	06:33
*** Swami has quit IRC		06:33
AJaeger	oops ;(	06:33
ianw	so let's say 5 minutes a megabyte, 5*80 == 6 hours?	06:34
*** yamahata has joined #openstack-infra		06:34
ianw	the old checksum file is still there	06:34
*** udesale has quit IRC		06:34
AJaeger	jlk: something is wrong with the periodic translation jobs, see http://logs.openstack.org/periodic/git.openstack.org/openstack/glance/stable/newton/propose-translation-update/2302cc1/ - I expected that to be a master job since we only converted master...	06:35
*** rcernin has joined #openstack-infra		06:35
*** udesale has joined #openstack-infra		06:35
AJaeger	jlk: That one failed to isntall packages as well. Missing root?	06:37
*** armaan has joined #openstack-infra		06:38
*** srobert has joined #openstack-infra		06:40
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Install zanata dependencies as root https://review.openstack.org/511396	06:40
AJaeger	jlk, ianw, quick fix for the second problem ^	06:40
ianw	AJaeger: see my notes though, not sure how much will merge	06:41
*** markvoelker has quit IRC		06:44
*** dhinesh has quit IRC		06:44
*** srobert has quit IRC		06:44
eumel8	AJaeger: There are more tasks in this role which requires root	06:45
*** pgadiya has quit IRC		06:45
AJaeger	eumel8: want to do a followup fix?	06:46
chandankumar	ianw: hello	06:46
AJaeger	ianw: those jobs looks fine - but in general I agree ;( Thanks for wading through it	06:47
chandankumar	ianw: how to add initial core reviewers for this http://git.openstack.org/cgit/openstack/python-tempestconf/ ? we need to add 4 people for the same.	06:47
ianw	chandankumar: i added you as core, you can now add as you want	06:49
eumel8	AJaeger: just wondering if this full playbook runs not under root	06:49
chandankumar	ianw: thanks :-)	06:49
chandankumar	ianw: one more help i need on this review https://review.openstack.org/#/c/511194/	06:50
AJaeger	eumel8: it does not run as root, see the link above if you want to check	06:51
chandankumar	ianw: https://review.openstack.org/#/admin/groups/1842,members i am not able to add other core reviewers	06:51
chandankumar	ianw: my email-id is chkumar@redhat.com	06:51
AJaeger	chandankumar: log out and in again	06:52
*** priteau has joined #openstack-infra		06:52
AJaeger	chandankumar: and add first thing the QA PTL, please - the repo is part of QA	06:52
AJaeger	ianw: so, 511396 passed tests	06:53
* AJaeger will be offline for a couple of hours now		06:54
chandankumar	AJaeger: it is a part of Refstack, i will add hogepodge but still facing the same issue after logging out and logging again	06:54
eumel8	ok	06:54
chandankumar	AJaeger: some people also complains that they are not able to add me a reviewer	06:54
ianw	infra-root / pabelanger: http://lists.openstack.org/pipermail/openstack-infra/2017-October/005610.html is likely my last update on the reprepro thing. it's currently trying to recreate the checksums.db as described. if that doesn't work, i'm out of ideas for now	06:55
*** priteau has quit IRC		06:55
*** priteau has joined #openstack-infra		06:55
ianw	chandankumar: i don't know, i'm almost out. want me to add someone else?	06:56
chandankumar	ianw: please add luigi toscano	06:56
chandankumar	and Chris Hoge	06:57
*** thorst has joined #openstack-infra		06:57
ianw	chandankumar: what's you account id?	06:57
ianw	click on your name and settings	06:57
chandankumar	ianw: username chkumar246	06:58
chandankumar	Username	06:58
chandankumar	chkumar246	06:58
chandankumar	Full Name Chandan Kumar	06:58
chandankumar	Email Address chkumar@redhat.com	06:58
ianw	Account ID below that	06:58
chandankumar	Account ID 12393	06:58
ianw	Oct 12, 2017 5:48 PMAddedChandan Kumar (8944)	06:59
*** priteau has quit IRC		06:59
ianw	that's the problem. i think someone will have to manually delete the account. check back in US hours for another infra root, i've got to EOD sorry	06:59
chandankumar	ianw: no problem thanks :-)	07:00
*** thorst_ has joined #openstack-infra		07:00
chandankumar	take rest, have anice night ahead :-)	07:00
*** eumel8 has quit IRC		07:01
*** Dinesh_Bhor has quit IRC		07:01
*** slaweq has joined #openstack-infra		07:01
*** thorst has quit IRC		07:01
*** thorst_ has quit IRC		07:05
*** Dinesh_Bhor has joined #openstack-infra		07:07
*** vsaienk0 has joined #openstack-infra		07:09
*** dingyichen has quit IRC		07:10
*** Hal has joined #openstack-infra		07:13
*** Hal is now known as Guest66337		07:14
sshnaidm	is issue with "permission denied" is back? http://logs.openstack.org/84/508884/1/check/legacy-tripleo-ci-centos-7-nonha-multinode-oooq/f7a99fa/logs/devstack-gate-setup-workspace-new.txt	07:14
sshnaidm	I thought it was solved yesterday	07:14
*** yamahata has quit IRC		07:15
*** pcaruana has joined #openstack-infra		07:17
*** florianf has joined #openstack-infra		07:20
openstackgerrit	Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove unused kuryr-libnetwork jobs https://review.openstack.org/511404	07:22
*** aviau has quit IRC		07:23
*** aviau has joined #openstack-infra		07:24
*** gildub has quit IRC		07:24
*** armaan has quit IRC		07:24
*** jpich has joined #openstack-infra		07:29
*** shardy has joined #openstack-infra		07:30
*** florianf has quit IRC		07:32
*** tesseract has joined #openstack-infra		07:32
openstackgerrit	Sagi Shnaidman proposed openstack-infra/tripleo-ci master: DNM: test containers update https://review.openstack.org/511175	07:32
*** florianf has joined #openstack-infra		07:32
*** rwsu has joined #openstack-infra		07:33
*** andreas_s has joined #openstack-infra		07:35
*** ykarel__ has joined #openstack-infra		07:38
AJaeger	sshnaidm: see the status emails by ianw and monty on openstack-dev, this is not solved yet. You know, when it rains, it pours... ;(	07:39
ethfci	guys i feel it is high time for a 'stop the line'?	07:40
ethfci	since days Jenkins and Zull is dead...	07:40
*** ykarel_ has quit IRC		07:41
*** markvoelker has joined #openstack-infra		07:41
ethfci	still facing with the 'libcurl4-gnutls-dev' issue...	07:43
*** stakeda has joined #openstack-infra		07:45
*** egonzalez has joined #openstack-infra		07:45
AJaeger	ethfci: http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html	07:46
*** hashar has joined #openstack-infra		07:57
*** d0ugal has joined #openstack-infra		07:59
*** eumel8 has joined #openstack-infra		08:00
*** gildub has joined #openstack-infra		08:03
*** dbecker has joined #openstack-infra		08:05
openstackgerrit	Tovin Seven proposed openstack-infra/openstack-zuul-jobs master: Remove legacy oslo.db job https://review.openstack.org/511412	08:05
openstackgerrit	Tovin Seven proposed openstack-infra/project-config master: Remove legacy oslo.db job https://review.openstack.org/511414	08:05
*** yamamoto has quit IRC		08:05
*** yamamoto has joined #openstack-infra		08:08
*** markvoelker has quit IRC		08:14
*** edmondsw has joined #openstack-infra		08:15
*** s-shiono has quit IRC		08:17
*** priteau has joined #openstack-infra		08:18
*** edmondsw has quit IRC		08:20
*** shardy has quit IRC		08:29
*** shardy has joined #openstack-infra		08:29
kazsh	AJaeger: G'day, got PTL's +1 accordingly, please check https://review.openstack.org/#/c/509119/	08:31
*** ralonsoh has joined #openstack-infra		08:33
*** lucas-afk is now known as lucasagomes		08:34
*** tosky has joined #openstack-infra		08:35
*** derekh has joined #openstack-infra		08:38
tosky	AJaeger: hi! Going back to the previous questions about python-tempestconf by chandankumar: I'm adding the missing people, but when the project is approved under refstack, will we add refstack-core instead of specifying for example the PTL directly in python-tempestconf?	08:49
*** spectr has quit IRC		08:49
AJaeger	tosky: that all depends on how the refstack team wants to have it working ;)	08:49
tosky	ack	08:50
AJaeger	You could add refstack-core (or could have created the repo with reusing the refstack ACLs).	08:50
AJaeger	Or add a subteam of refstack... I would either add refstack-core or the PTL - and let the PTL decide the rest (after discussion with the team obviously)	08:50
AJaeger	kazsh: thanks, will review later	08:51
*** yamamoto has quit IRC		08:55
tosky	sure, I added the PTL for now	08:56
*** thorst has joined #openstack-infra		09:02
*** e0ne has joined #openstack-infra		09:04
*** thorst has quit IRC		09:05
*** gildub has quit IRC		09:09
*** jascott1 has joined #openstack-infra		09:10
openstackgerrit	Duong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Karbor https://review.openstack.org/511432	09:12
openstackgerrit	Duong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Karbor https://review.openstack.org/511433	09:12
openstackgerrit	Duong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Karbor https://review.openstack.org/511432	09:14
*** jascott1 has quit IRC		09:14
openstackgerrit	Sagi Shnaidman proposed openstack-infra/tripleo-ci master: Configure OVB jobs to use local mirrors for images https://review.openstack.org/511434	09:14
*** yamamoto has joined #openstack-infra		09:15
*** yamamoto has quit IRC		09:15
*** ykarel__ is now known as ykarel\|lunch		09:17
openstackgerrit	Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Use native propose-translation jobs https://review.openstack.org/511435	09:17
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Fix propose-translation-update https://review.openstack.org/511436	09:17
*** spectr has joined #openstack-infra		09:19
*** ociuhandu has quit IRC		09:23
openstackgerrit	Duong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Murano https://review.openstack.org/511438	09:24
openstackgerrit	Duong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Murano https://review.openstack.org/511439	09:24
*** chenying_ has joined #openstack-infra		09:25
*** logan- has quit IRC		09:27
AJaeger	jlk, mordred, pleaes review https://review.openstack.org/511435 https://review.openstack.org/511436 and https://review.openstack.org/511396 - and review whether we need root access in other places as well. Hope that gets us moving forward with translations...	09:31
openstackgerrit	Duong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Solum https://review.openstack.org/511440	09:31
openstackgerrit	Duong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Solum https://review.openstack.org/511441	09:31
*** yamamoto has joined #openstack-infra		09:44
*** kiennt26 has quit IRC		09:53
*** priteau has quit IRC		09:53
*** electrofelix has joined #openstack-infra		10:00
*** thorst has joined #openstack-infra		10:02
*** liujiong has quit IRC		10:02
*** sree has quit IRC		10:02
*** sree has joined #openstack-infra		10:03
*** egonzalez has quit IRC		10:03
*** edmondsw has joined #openstack-infra		10:04
*** andreas_s has quit IRC		10:06
*** rpittau has quit IRC		10:06
*** andreas_s has joined #openstack-infra		10:06
*** rpittau has joined #openstack-infra		10:06
*** edmondsw has quit IRC		10:08
*** thorst has quit IRC		10:08
*** andreas_s has quit IRC		10:11
*** spectr has quit IRC		10:12
*** markvoelker has joined #openstack-infra		10:12
*** pbourke has quit IRC		10:15
*** ykarel\|lunch is now known as ykarel		10:15
*** egonzalez has joined #openstack-infra		10:17
*** pbourke has joined #openstack-infra		10:17
*** andreas_s has joined #openstack-infra		10:20
*** sbezverk has quit IRC		10:21
*** spectr has joined #openstack-infra		10:25
*** sree has quit IRC		10:26
*** mikal has quit IRC		10:28
*** thingee has quit IRC		10:28
*** thingee has joined #openstack-infra		10:28
*** mikal has joined #openstack-infra		10:30
*** udesale has quit IRC		10:30
*** gcb has quit IRC		10:30
*** panda\|rover\|off is now known as panda\|rover		10:30
*** andreas_s has quit IRC		10:33
*** boden has joined #openstack-infra		10:34
*** andreas_s has joined #openstack-infra		10:39
*** armaan has joined #openstack-infra		10:40
*** florianf has quit IRC		10:40
*** florianf has joined #openstack-infra		10:40
*** priteau has joined #openstack-infra		10:42
*** armaan has quit IRC		10:43
openstackgerrit	Sagi Shnaidman proposed openstack-infra/tripleo-ci master: Use infra proxy server for trunk.r.o in delorean-deps https://review.openstack.org/508884	10:45
*** markvoelker has quit IRC		10:45
*** clayton has quit IRC		10:49
*** clayton has joined #openstack-infra		10:51
*** andreas_s has quit IRC		10:53
*** edmondsw has joined #openstack-infra		10:53
*** florianf has quit IRC		10:54
*** andreas_s has joined #openstack-infra		10:54
*** florianf has joined #openstack-infra		10:55
*** andreas_s has quit IRC		10:56
*** andreas_s has joined #openstack-infra		10:56
*** edmondsw has quit IRC		10:56
*** logan- has joined #openstack-infra		10:57
*** sambetts\|afk is now known as sambetts		11:01
*** zoli is now known as zoli\|lunch		11:02
*** zoli\|lunch is now known as zoli		11:02
*** priteau has quit IRC		11:02
*** priteau has joined #openstack-infra		11:03
*** dave-mccowan has joined #openstack-infra		11:04
*** andreas_s has quit IRC		11:07
*** priteau has quit IRC		11:08
*** shardy is now known as shardy_lunch		11:08
*** priteau has joined #openstack-infra		11:10
*** wolverineav has joined #openstack-infra		11:10
*** sdague has joined #openstack-infra		11:11
*** gildub has joined #openstack-infra		11:12
*** florianf has quit IRC		11:13
*** florianf has joined #openstack-infra		11:14
*** priteau has quit IRC		11:16
*** gmann is now known as gmann_afk		11:17
*** jkilpatr has joined #openstack-infra		11:23
*** Srinivas has quit IRC		11:26
*** cuongnv has quit IRC		11:26
*** yamamoto has quit IRC		11:27
*** martinkopec has quit IRC		11:29
*** priteau has joined #openstack-infra		11:35
*** andreas_s has joined #openstack-infra		11:35
*** ykarel has quit IRC		11:37
*** ykarel has joined #openstack-infra		11:38
*** nicolasbock has joined #openstack-infra		11:39
*** andreas_s has quit IRC		11:39
*** andreas_s has joined #openstack-infra		11:40
*** markvoelker has joined #openstack-infra		11:43
*** nicolasbock has quit IRC		11:45
strigazi	Hello AJaeger, Can we merge magnum's zuulv3 patch? https://review.openstack.org/#/c/508676/ I'm a little lost with all these new fast failures/RETRY_LIMIT	11:46
*** andreas_s has quit IRC		11:49
*** yamamoto has joined #openstack-infra		11:50
mordred	infra-root: I've got a doctor's appointment this morning - out for the next few hours	11:52
*** coolsvap has quit IRC		11:55
AJaeger	mordred: all the best!	11:55
jamespage	morning/afternoon all - we had some issues with installation of libcurl4-gnutls-dev in our check/gate jobs yesterday which I understood where due to an ubuntu archive cache problem	11:56
jamespage	still seeing some of those today - https://review.openstack.org/#/c/504310/	11:56
jamespage	do we still have some inconsistency somewhere?	11:56
AJaeger	strigazi: the syntax and set up looks fine, otherwise Zuul would have complained - it was able to run the jobs successfully. There'S no regresssion from legacy to your jobs, so the migration looks fine. With all the problems, the -1 are to be expected. I think you can merge the change but I doubt you will be able to since there's no +1 by Jenkins yet.	11:56
AJaeger	jamespage: http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html	11:57
AJaeger	jamespage: basically: the problem is not fixed yet	11:57
*** nicolasbock has joined #openstack-infra		11:58
*** stakeda has quit IRC		11:59
*** andreas_s has joined #openstack-infra		11:59
*** dprince has joined #openstack-infra		12:00
*** hashar has quit IRC		12:01
*** trown\|outtypewww is now known as trown		12:02
*** lucasagomes is now known as lucas-hungry		12:02
*** andreas_s has quit IRC		12:03
*** andreas_s has joined #openstack-infra		12:04
*** thorst has joined #openstack-infra		12:06
*** andreas_s has quit IRC		12:08
*** andreas_s has joined #openstack-infra		12:08
*** shardy_lunch is now known as shardy		12:09
*** edmondsw has joined #openstack-infra		12:10
*** edmondsw_ has joined #openstack-infra		12:10
*** gcb has joined #openstack-infra		12:12
*** edmondsw has quit IRC		12:14
*** markvoelker has quit IRC		12:16
sambetts	AJaeger: are the zuul v2 docs completely gone now?? I'm trying to change a configuration in our third party CI and all the zuul docs are for zuul v3 now	12:19
*** lifeless has quit IRC		12:24
AJaeger	sambetts: do you mean the infra-manual? You can check it out and build locally...	12:25
*** gmann_afk is now known as gmann		12:26
openstackgerrit	Major Hayden proposed openstack-infra/project-config master: Remove OpenStack/Ceph/Virt repo from CentOS https://review.openstack.org/493003	12:26
*** hrybacki\|trainin is now known as hrybacki		12:26
sambetts	AJaeger: so its not published any more? not planning on doing https://docs.openstack.org/infra/zuul/v2 like the other projects have ocata/pike etc	12:26
*** eharney has joined #openstack-infra		12:28
*** lifeless has joined #openstack-infra		12:31
*** markvoelker has joined #openstack-infra		12:33
*** rosmaita has joined #openstack-infra		12:33
efried	Good morning infra. Is this a known issue? I'm seeing quite a bit of it: http://logs.openstack.org/06/502306/10/check/gate-nova-specs-docs-ubuntu-xenial/e9f94d9/console.html	12:33
*** udesale has joined #openstack-infra		12:36
*** kgiusti has joined #openstack-infra		12:36
*** udesale has quit IRC		12:37
*** udesale has joined #openstack-infra		12:37
rosmaita	also on https://review.openstack.org/#/c/493654/7	12:38
rosmaita	actually, https://review.openstack.org/#/c/493654/8	12:39
*** ociuhandu has joined #openstack-infra		12:40
*** adarazs is now known as adarazs_brb		12:41
vsaienk0	efried: we need to switch to bindep to fix it https://review.openstack.org/#/c/444201/	12:41
*** links has quit IRC		12:41
efried	vsaienk0 That needs to be done for every project?	12:42
vsaienk0	looks like upstream deb repo is broken, and we install default package list, which actually is not needed for ironic tests, so adding bindep to our project with exact depends fixes problem	12:42
eumel8	efried, rosmaita: hat are known issues. look at http://lists.openstack.org/pipermail/openstack-dev/2017-October/123489.html	12:42
*** andreas_s has quit IRC		12:43
eumel8	s/hat/that/	12:43
*** LindaWang has quit IRC		12:43
*** andreas_s has joined #openstack-infra		12:43
*** liusheng has quit IRC		12:45
*** chenying_ has quit IRC		12:45
vsaienk0	efried: ideally each project should have its own bindep file with only exact dependencies it needs. By not having this file enforce jobs to install default package list	12:45
*** liusheng has joined #openstack-infra		12:45
*** bobh has joined #openstack-infra		12:47
*** mriedem has joined #openstack-infra		12:47
*** florianf has quit IRC		12:52
*** florianf has joined #openstack-infra		12:52
openstackgerrit	Stephen Finucane proposed openstack-dev/pbr master: Discover Distribution through the class hierarchy https://review.openstack.org/399188	12:54
openstackgerrit	Stephen Finucane proposed openstack-dev/pbr master: Remove unnecessary 'if True' https://review.openstack.org/510806	12:54
stephenfin	dhellmann, mordred: Want to take a look at those? Think they can merge now (setuptools had changed stuff under the hood)	12:54
*** LindaWang has joined #openstack-infra		12:56
*** andreas_s has quit IRC		12:57
*** lucas-hungry is now known as lucasagomes		12:58
*** andreas_s has joined #openstack-infra		13:01
*** adarazs_brb is now known as adarazs		13:02
*** jcoufal has joined #openstack-infra		13:03
*** esberglu has quit IRC		13:03
openstackgerrit	Merged openstack-infra/tripleo-ci master: Remove unnecessary scripts from tripleo-ci https://review.openstack.org/510828	13:06
AJaeger	sambetts: ah, that'S what you mean - better ask the zuul folks, can't help with that one	13:10
AJaeger	efried, rosmaita http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html	13:11
AJaeger	argh, wrong link - I see eumel8 gave the correct one...	13:11
rosmaita	AJaeger thanks	13:11
sambetts	AJaeger: what channel can I find zuul folks in? here or do they have a separate one?	13:12
*** mat128 has joined #openstack-infra		13:13
AJaeger	sambetts: #zuul ;)	13:13
*** baoli has joined #openstack-infra		13:13
sambetts	ah not openstack-zuul (tried that one and it didn't exist)	13:13
AJaeger	sambetts: but here as well - just give them a chance to wake up and drink their morning coffee, please :)	13:13
sambetts	of course :D	13:13
AJaeger	rosmaita: regarding bindep: Yes, that might help in this case - you might want to review https://review.openstack.org/#/c/468159/	13:15
aspiers	FYI, how Google uses Gerrit: https://gitenterprise.me/2017/10/10/gerrit-user-summit-gerrit-at-google/	13:15
aspiers	pretty interesting setup at scale	13:15
AJaeger	jamespage: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123489.html is the link I wanted to point you to earlier	13:15
*** camunoz has joined #openstack-infra		13:17
*** esberglu has joined #openstack-infra		13:18
*** edmondsw_ is now known as edmondsw		13:19
*** ykarel has quit IRC		13:20
*** ykarel has joined #openstack-infra		13:20
*** esberglu has quit IRC		13:21
*** esberglu has joined #openstack-infra		13:23
*** tikitavi has joined #openstack-infra		13:23
*** chlong has joined #openstack-infra		13:25
*** cshastri has quit IRC		13:27
*** jaosorior has quit IRC		13:27
*** sree has joined #openstack-infra		13:27
*** dbecker has quit IRC		13:27
*** dbecker has joined #openstack-infra		13:27
*** sree has quit IRC		13:31
*** gmann is now known as gmann_afk		13:31
openstackgerrit	Jean-Philippe Evrard proposed openstack-infra/irc-meetings master: Moving the OpenStack-Ansible meeting time and channel https://review.openstack.org/511479	13:33
*** sree has joined #openstack-infra		13:37
*** sbezverk has joined #openstack-infra		13:39
fungi	okay, i'm here and catching up on scrollback now. i can already see we're still out of inodes on the logs volume even though the tempfile deletion pass and expired log purging are still going from before i went to sleep	13:41
*** sree has quit IRC		13:41
AJaeger	still out of inodes? Ooops ;( And Ubuntu mirror also still broken ;(	13:42
AJaeger	fungi: you didn't sleep long enough ;) Good morning!	13:42
AJaeger	fungi, jeblair, good news: We had for the first time periodic jobs running with Zuul v3. Read in backscroll and etherpad some of the issues it uncovered ;)	13:43
*** cshastri has joined #openstack-infra		13:43
*** kashyap has joined #openstack-infra		13:44
kashyap	Can anyone link to the upstream live migration job, please?	13:47
fungi	ykarel: the missing tripleo release tarballs from yesterday are due to the stale ubuntu mirror issue. we'll rerun the jobs to build and publish them as soon as that's sorted out	13:48
*** gouthamr has joined #openstack-infra		13:49
*** kiennt26 has joined #openstack-infra		13:50
*** nikhil_ has joined #openstack-infra		13:52
*** kiennt26 has quit IRC		13:52
*** nikhil_ is now known as Guest48516		13:52
*** kiennt26 has joined #openstack-infra		13:53
*** Guest48516 is now known as nikhil_k		13:53
*** andreas_s has quit IRC		13:54
*** andreas_s has joined #openstack-infra		13:55
*** kiennt26 has quit IRC		13:55
*** kiennt26 has joined #openstack-infra		13:55
ykarel	fungi, Ok	13:56
*** srobert has joined #openstack-infra		13:56
*** srobert has quit IRC		13:56
*** smatzek has joined #openstack-infra		13:56
fungi	the last mirror-update pass ianw started seems to still be running	13:57
*** srobert has joined #openstack-infra		13:57
openstackgerrit	Petr Kovar proposed openstack-infra/irc-meetings master: Update chair for doc team meeting https://review.openstack.org/511484	13:57
*** andreas_s has quit IRC		13:57
AJaeger	fungi, ianw sent a status report via email - on the infra list	13:57
*** andreas_s has joined #openstack-infra		13:57
fungi	AJaeger: thanks, i'm caught up on irc scrollback so proceeding to e-mail backlog next	13:58
fungi	also, i seem to be having some massive packet loss from here today... not sure what's up	13:58
fungi	(my home broadband uplink i mean)	13:58
AJaeger	not good ;(	13:59
*** sree has joined #openstack-infra		14:00
Shrews	fungi: Oh no! Pack loss i not goo . Hope t gets b tter fo you.	14:03
fungi	;)	14:03
*** gildub has quit IRC		14:06
*** jaosorior has joined #openstack-infra		14:09
AJaeger	:)	14:09
*** hamzy has quit IRC		14:12
*** hongbin has joined #openstack-infra		14:13
*** chlong has quit IRC		14:13
*** eumel8 has quit IRC		14:14
*** florianf has quit IRC		14:15
*** florianf has joined #openstack-infra		14:15
*** chlong has joined #openstack-infra		14:19
pabelanger	ianw: thanks	14:20
pabelanger	Ya, inodes look bad still	14:21
jeblair	i'm on mirror-update and looking through the screen windows and don't see one running reprepro	14:22
fungi	infra-root: making a judgement call here, i think we're going to need to drop our retention on the logs site	14:22
*** hemna_ has joined #openstack-infra		14:23
fungi	jeblair: one of the screen windows was when i looked a moment ago	14:23
jeblair	nor do i see a reprepro process running	14:23
pabelanger	jeblair: yes, ianw last updated is it failed to write to AFS directories	14:23
*** yamamoto has quit IRC		14:23
pabelanger	/tmp/ianw contains logs	14:23
fungi	jeblair: no, wauit, you're right. that was the flock bysywait i saw	14:23
jeblair	pabelanger: where's that update? last i saw ianw said it was still running.	14:24
fungi	jeblair: infra mailing list	14:24
pabelanger	jeblair: ianw posted a reply to infra ML	14:24
pabelanger	I'm still getting up to speed myself	14:24
jeblair	"I restarted for good luck,"	14:24
*** cshastri has quit IRC		14:25
fungi	may as well have been "for great justice"	14:25
jeblair	that makes me think we should see either a process running, or a prompt right after a process died	14:25
jeblair	i can find neither	14:25
*** rbrndt has joined #openstack-infra		14:27
jeblair	also, do we still have the old images? can we delete the new ones from nodepool to get things working again?	14:27
*** psachin has quit IRC		14:28
pabelanger	looking	14:29
pabelanger	we still haven't upload to rackspace, so we should be good there for now	14:29
pabelanger	I don't think out other clouds have a good image anymore	14:29
jeblair	okay. we should have deleted the new images yesterday, as soon as this started.	14:30
*** armax has joined #openstack-infra		14:30
fungi	but does nodepool remove the old image from disk if it hasn't been able to upload newer images?	14:30
pabelanger	since the AFS read-only mirror is working, we could also do the work needed to have DIBs use them	14:30
jeblair	fungi: good point, it does not.	14:30
pabelanger	which, should fix our issues for now, but pin us at specific version of xenial	14:30
pabelanger	fungi: yah	14:30
fungi	thinking rackspace may have saved us here	14:31
jeblair	Shrews: around?	14:31
fungi	since there are several people focusing on the mirror situation, i'll focus on the logs site	14:31
jeblair	fungi: ++	14:31
jeblair	fungi: i support whatever retention period you want to use :)	14:31
fungi	infra-root: last call for objections, i'm planning to reduce our log retention from 4 to 3 weeks (and if that doesn't help fast enough, i'll drop it to 2)	14:32
pabelanger	++	14:32
Shrews	jeblair: yes	14:32
clarkb	fungi: seems reasonable, its still just an inode problem right?	14:32
jeblair	Shrews: can you and pabelanger work on getting the old rax images uploaded everywhere?	14:33
fungi	clarkb: yes, but we have so many that traversing them to fnid tempfiles or 4-week old files seems to be taking too long, so we need something with a higher hit-rate for now i expect	14:33
jeblair	this is a tricky thing we've never done before, so best to proceed carefully	14:33
clarkb	fungi: maybe before deleting the old stuff really quickly do an inode cou t between some common jobs like tempest just to make sure we havent regressed there too?	14:34
pabelanger	jeblair: Shrews: if we manually upload, we could use cloud-images section in nodepool?	14:34
*** andreas_s has quit IRC		14:34
pabelanger	otherwise, will defer to Shrews	14:34
*** andreas_s has joined #openstack-infra		14:35
clarkb	though that will just tell us if it has changed, not what or why (so maybe less important)	14:35
fungi	clarkb: i'll see if i can spot anything real quick	14:35
Shrews	pabelanger: not sure. lemme catch up on things	14:35
fungi	clarkb: but my guess is that all these other unrelated issues are just resulting in higher log volume as jobs fail more quickly and people are rechecking them all	14:36
clarkb	ya	14:37
openstackgerrit	Paul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace https://review.openstack.org/511492	14:37
jeblair	pabelanger, Shrews: rolling forward is a good option too :)	14:38
pabelanger	jeblair: clarkb: fungi: ^in case we want to go this route. Should make our DIB builds use AFS mirrors for ubuntu	14:38
fungi	clarkb: actually, i'm not even sure i can easily get a representative sample since we're not successfully uploading logs	14:38
*** xarses has joined #openstack-infra		14:38
clarkb	fungi: oh right ugh	14:39
fungi	i need to find one which successfully managed to upload all its files	14:39
pabelanger	jeblair: clarkb: fungi: regardless what we do, I believe we also want to disable nb03.o.o, as it has a slow uplink to clouds. Takes upwards of 6 hours to upload, vs 30mins on nb04.o.o	14:39
*** caphrim007 has quit IRC		14:39
*** links has joined #openstack-infra		14:39
*** andreww has joined #openstack-infra		14:41
pabelanger	I've stopped nb03 for now	14:41
*** andreww has quit IRC		14:41
fungi	clarkb: i expect that it will be easier to find the culprit(s) and address the issue once we have working log uploads again, so i'm going ahead with the >3-week purge now	14:42
pabelanger	Okay, I think ubuntu-xenial-0000001137 is the DIB we want to save	14:42
pabelanger	that is our oldest ubuntu-xenial image	14:42
Shrews	pabelanger: so cloud-images is a feature/zuulv3 thing (builders aren't running that), so that's a no-go	14:42
pabelanger	kk	14:43
*** xarses has quit IRC		14:43
pabelanger	ubuntu-xenial-0000001138 is also an option I think	14:43
*** andreww has joined #openstack-infra		14:44
*** supertakumi86 has joined #openstack-infra		14:44
pabelanger	and believe that is what we are booting in rackspace now	14:44
pabelanger	trying to confirm	14:44
Shrews	pabelanger: is rebuilding a new image an option?	14:44
openstackgerrit	Michael Turek proposed openstack/diskimage-builder master: Add iscsi-boot element https://review.openstack.org/511494	14:44
andreykurilin	hi folks! There are a lot of POST_FAILURES in zuul_v2. Is it ok?	14:44
pabelanger	Shrews: it is, but we'll need 511492	14:44
pabelanger	I'm happy to give it a try	14:44
*** supertakumi86 has quit IRC		14:45
openstackgerrit	Michael Turek proposed openstack/diskimage-builder master: Add iscsi-boot element https://review.openstack.org/511494	14:45
pabelanger	infra-root: do we want to roll forward an image with 511492 first?	14:45
jeblair	clarkb: want to work on a status alert?	14:45
pabelanger	but, we should first copy ubuntu-xenial-0000001138 or ubuntu-xenial-0000001137 to be safe	14:46
*** jkilpatr has quit IRC		14:46
clarkb	jeblair: sure	14:46
jeblair	pabelanger: whatever you and Shrews think is safest and quickest	14:46
fungi	for the current 3-week expiration pass i've also switched from -exec rm {} \; to -delete and removed the check for removing empty directories for now in hopes this will cover ground more quickly	14:46
*** iyamahat has joined #openstack-infra		14:46
pabelanger	jeblair: ack	14:47
pabelanger	Shrews: okay, so sounds like you want to try new image? I'll let you review 511492 and I'll save our DIBs	14:47
clarkb	how does "Job log retention is being reduced to get inode consumption under control. Separately we are updating job instance images to use our ubuntu mirrors temporarily addressing the problems with Xenial packaging."	14:48
clarkb	er how does that look	14:48
Shrews	pabelanger: i think so. if we delete the bad images, i think a new one is just going to be built anyway	14:48
clarkb	maybe too verbose	14:48
jeblair	clarkb: maybe cover the problem and symptoms first. i don't think folks need to know what we're doing	14:48
fungi	s/updating/reverting/ maybe	14:48
fungi	but yeah	14:49
pabelanger	Shrews: okay, lets land 511492 and give it a go	14:49
Shrews	pabelanger: +A'd	14:49
*** iyamahat_ has joined #openstack-infra		14:49
clarkb	"Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow."	14:50
clarkb	that better?	14:50
jeblair	clarkb: wfm	14:50
fungi	ship it	14:50
pabelanger	++	14:50
Shrews	pabelanger: if nb03 is stopped, 0000001137 and 0000001138 will not be deleted, but good to make a backup anyway	14:50
pabelanger	Shrews: ya, doing that in /opt/nodepool_dib.backup-pabelanger now	14:51
*** iyamahat__ has joined #openstack-infra		14:51
*** Swami has joined #openstack-infra		14:51
Shrews	pabelanger: once your change lands, we'll delete 0000001140 and 0000001141	14:51
clarkb	#status alert Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow.	14:51
openstackstatus	clarkb: sending alert	14:51
jeblair	i'm going to add a new volume to the afs server because the debian volume has 7G free. then i will increase the quota on that volume. then i will reboot both the mirror-update server and afs01.dfw. then i will attempt the mirror repair again.	14:51
pabelanger	clarkb: jobs that don't need gnutls, could stop using bindep-fallback.txt and properly add their own bindep.txt also	14:51
pabelanger	not long term, but help mitigate the issue	14:52
*** iyamahat has quit IRC		14:52
*** jkilpatr has joined #openstack-infra		14:53
*** andreas_s has quit IRC		14:53
*** iyamahat__ has quit IRC		14:53
*** yamahata has joined #openstack-infra		14:53
*** iyamahat__ has joined #openstack-infra		14:53
-openstackstatus- NOTICE: Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow.		14:53
*** ChanServ changes topic to "Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow."		14:54
clarkb	Anything that uses cryptography needs it though right? That is going to be many things	14:54
clarkb	the subset that doesnt need it is probably small enough that identifying it is too mich work	14:54
*** rcernin has quit IRC		14:55
*** iyamahat_ has quit IRC		14:55
*** hamzy has joined #openstack-infra		14:55
openstackstatus	clarkb: finished sending alert	14:57
*** iyamahat__ has quit IRC		14:58
*** iyamahat has joined #openstack-infra		14:58
pabelanger	clarkb: Shrews: 511492 failed with POST_FAILURE. I suggest we add nodepool to emergency file, and manually apply. While we attempt to recheck it	14:58
clarkb	pabelanger: wfm, though its builders that need it?	14:59
jeblair	clarkb, pabelanger: is the npm mirror in production?	14:59
pabelanger	no, there is a patch up to remove it from AFS	15:00
pabelanger	clarkb: ya, nb04	15:00
jeblair	okay, i'll kill the process	15:00
Shrews	pabelanger: are you confident that will fix the build image? because the other option here is just pause image builds and delete the bad images and run with the older images for awhile	15:01
jeblair	oh that's nice, the npm mirror releases even if the process is killed	15:01
*** eharney has quit IRC		15:02
*** chlong has quit IRC		15:02
*** andreas_s has joined #openstack-infra		15:02
Shrews	but if we know it will fix it, would be best to move forward with the fix, IMO	15:02
clarkb	Shrews: I dont think any of our images are old enough to work at this point	15:03
jeblair	clarkb: rax	15:03
pabelanger	Shrews: no, we need to first upload the good images, only rackspace today have them	15:03
clarkb	right except in rax	15:03
pabelanger	so, that process needs to be manual	15:03
jeblair	clarkb: we have the images on disk	15:03
clarkb	oh right we keep all the formats	15:03
clarkb	until all formats can be removed	15:04
jeblair	and this is why	15:04
Shrews	oh, rax has the oldest ones. got it	15:05
*** jaosorior has quit IRC		15:05
pabelanger	okay, patch manually applied	15:07
pabelanger	ready to image-build xenial	15:07
pabelanger	clarkb: Shrews:^	15:07
*** dhajare has quit IRC		15:08
Shrews	pabelanger: cool. kick it off	15:08
pabelanger	started	15:09
evrardjp	hey, is there an env var I can use in my job to check if I am under zuul v3 or jenkins?	15:09
pabelanger	evrardjp: I think we said $(whoami)? If user is jenkins, then you are jenkins	15:10
pabelanger	otherwise it would be zuul	15:10
evrardjp	ok	15:10
AJaeger	pabelanger: NO!	15:10
pabelanger	evrardjp: listen to AJaeger	15:10
evrardjp	I couldn't use that :)	15:10
AJaeger	evrardjp: check email by Monty, let me find it quickly...	15:10
*** annp has joined #openstack-infra		15:11
evrardjp	AJaeger: let me search then	15:11
pabelanger	Shrews: k, we are pulling packages from http://mirror.dfw.rax.openstack.org/ubuntu now	15:11
AJaeger	evrardjp: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123049.html	15:11
evrardjp	wow that was fast	15:11
AJaeger	;)	15:11
AJaeger	bbl, shutting down here so won't be able to read backscroll for some time...	15:12
*** sree has quit IRC		15:12
*** AJaeger has quit IRC		15:12
evrardjp	AJaeger but that's only openstack's zuul's behavior, I don't have a magic variable I can use outside if need be	15:13
evrardjp	at least I have something.	15:13
evrardjp	thanks!	15:13
Shrews	pabelanger: i think we'll need delete 1141 before the new one will upload. i see 1141 still uploading rax too	15:14
*** iyamahat has quit IRC		15:14
*** yamahata has quit IRC		15:14
*** sadasu has joined #openstack-infra		15:15
Shrews	(my weechat session is picking a very poor time to randomly freeze on me)	15:15
*** andreas_s has quit IRC		15:15
*** eharney has joined #openstack-infra		15:15
pabelanger	k, we had a minor issue with --allow-unauthenticated, working on patch	15:16
*** chlong has joined #openstack-infra		15:16
jeblair	is the rubygems mirror in production?	15:19
pabelanger	no, we are using reverse proxy for that also	15:20
jeblair	okay we really need to clean this stuff up	15:20
fungi	infra-root: the three-week expiration is not making traction fast enough to keep pace with new log uploads either. i'm going to switch to a two-week expiration as a last-ditch before we have to consider disabling uploads for a while to bring utilization back down or randomly deletnig rtees of the filesystem since using find to stat modify time is just too slow	15:20
jeblair	npm and ruby are both making this work very difficult	15:20
jeblair	infra-root: i'm going to delete both from afs	15:21
fungi	jeblair: sounds good	15:21
clarkb	fungi: do you think stat might not be able to keepp up? or are older logs less inody?	15:22
fungi	clarkb: i expect it's a combination of both of those plus we're uploading a lot more logs with zuul v3 also running check jobs	15:22
fungi	problem is finding those newer inody job logs and purging them is at least as expensive if not moreso than the date-based expirations	15:23
openstackgerrit	Paul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace https://review.openstack.org/511492	15:23
fungi	if we had the tree sharded by date/time somehow this would be a cinch	15:23
pabelanger	clarkb: Shrews: ^updates needed for AFS mirrors in DIBs. Logic come from nodepool-dsvm jobs	15:23
pabelanger	helps if I git add	15:24
openstackgerrit	Paul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace https://review.openstack.org/511492	15:24
*** yamamoto has joined #openstack-infra		15:24
fungi	clarkb: and also as i said earlier, all the recent issues in general are causing people to recheck changes far more frequently in vain hope that they'll suddenly work	15:24
clarkb	pabelanger: looks like it should work	15:25
pabelanger	clarkb: I think we might have to ask tripleo to drop more things, /etc for example	15:25
pabelanger	with 2 zuuls running and uploading to logs.o.o, I won't be surprised if tripleo jobs are eating all the inodes now	15:26
clarkb	pabelanger: ya once we've got some breathing room we'll need to gather data and see where inode usage is	15:26
jeblair	i'm going to restart afs02.dfw	15:27
*** udesale has quit IRC		15:27
*** electrofelix has quit IRC		15:30
fungi	i have my fingers crossed on the current >2-week purge, but it's not looking good so far and i'm afraid we're going to have to do recursive deletes based on some sort of filesystem glob rather than by age to get things back to sanity before we can make further progress through less disruptive means	15:30
jeblair	i'm going to restart afs01.dfw now	15:31
*** yamamoto has quit IRC		15:31
fungi	like, i can delete _all_ logs for specific jobs by name, or just remove jobs at random by wiping a high-level sibdirectory or two	15:31
jeblair	and mirror-update	15:31
*** dangers has joined #openstack-infra		15:32
clarkb	fungi: we might be able to construct some deletes based on change numbers to roughly correlate to dates?	15:32
*** LindaWang has quit IRC		15:32
fungi	that would be a very rough correlation	15:32
fungi	like, what delete jobs for any change id numbers below a certain threshhold?	15:34
*** andreas_s has joined #openstack-infra		15:34
*** Swami has quit IRC		15:34
*** Swami has joined #openstack-infra		15:35
*** hashar has joined #openstack-infra		15:36
fungi	a quick listing of 6-digit change ids prior to 500000 says there are 4646 of those	15:36
clarkb	ya	15:36
clarkb	though at this point those may actually be relatively active as they would've aged out already on their own otherwise	15:37
*** iyamahat has joined #openstack-infra		15:37
*** vsaienk0 has quit IRC		15:37
*** ykarel has quit IRC		15:37
*** annp has quit IRC		15:37
pabelanger	I'm starting to like the idea of top-level hash by UTC date again	15:38
*** iyamahat has quit IRC		15:38
*** iyamahat has joined #openstack-infra		15:38
jeblair	let's not redesign the system now	15:38
*** ykarel has joined #openstack-infra		15:38
pabelanger	okay, ubuntu-xenial now building properly with latest 511492 applied	15:39
pabelanger	in devstack-cache element now	15:39
*** e0ne has quit IRC		15:40
*** andreas_s has quit IRC		15:43
clarkb	find gate-tripleo-ci-centos-7-3nodes-multinode-nv \| grep -v '/tmp/ansible' \| wc -l reports 39549 /me finds some other comparisons	15:43
*** yamamoto has joined #openstack-infra		15:45
*** yamamoto has quit IRC		15:45
clarkb	er that was for multiple runs, its 9889 for a single run. 917 for single run of the tempest multinode job	15:46
pabelanger	100x	15:46
pabelanger	:(	15:46
*** links has quit IRC		15:47
*** andreas_s has joined #openstack-infra		15:48
clarkb	legacy multinode job under zuulv3 is roughly in that 900 range too	15:48
jeblair	pabelanger: how do i run the reprepro _detect command?	15:48
pabelanger	jeblair: you be: reprepro _detect	15:49
pabelanger	reprepro --configdir /etc/reprepro/ubuntu _detect maybe	15:49
jeblair	pabelanger: i ran:	15:50
jeblair	cd /afs/.openstack.org/mirror/ubuntu	15:50
jeblair	find pool -type f -print \| reprepro -b . _detect	15:50
jeblair	and then got:	15:50
jeblair	Error opening config file './conf/distributions': No such file or directory(2)	15:50
jeblair	pabelanger: so i'm looking for the exact command you or ianw ran	15:50
jeblair	should i not be doing the find thing?	15:51
jeblair	that's what it said in step 3 of https://github.com/esc/reprepro/blob/master/docs/recovery which ianw linked	15:51
ihrachys	all those POST_FAILURE failures that happened on all my patches yesterday, are those gone and we can recheck?	15:51
ihrachys	failures as in https://review.openstack.org/#/c/507966/	15:51
jeblair	ihrachys: no, see channel topic	15:51
ihrachys	ok thanks	15:52
*** kiennt26 has quit IRC		15:52
Shrews	pabelanger: so... i'm not sure how any image uploads are working at all. i'm seeing shade exceptions during image upload in builder logs	15:52
pabelanger	Shrews: not sure either, I haven't looked at logs yet	15:53
*** vsaienk0 has joined #openstack-infra		15:54
pabelanger	jeblair: hmm, let me see, I didn't try _detect	15:54
pabelanger	jeblair: I think you need to pass --confdir /etc/reprepro/ubuntu to your reprepro command	15:55
*** yamahata has joined #openstack-infra		15:55
jeblair	pabelanger: okay -- is that the command i should be running?	15:55
*** egonzalez has quit IRC		15:55
jeblair	i'm basically trying to just follow whatever instructions you and ianw gave. i thought it was clear, but it's becoming less so	15:55
*** priteau has quit IRC		15:55
jeblair	pabelanger: like, i'm very confused why you don't think i should run _detect. what do you think i should do?	15:56
pabelanger	jeblair: I am not sure, which document are you looking at currently? What is it you want reprepro to do?	15:56
*** kashyap has left #openstack-infra		15:56
jeblair	https://github.com/esc/reprepro/blob/master/docs/recovery	15:56
jeblair	pabelanger: ianw said checksums.db was bad. that says that's what you do when checksums.db is bad.	15:56
jeblair	pabelanger: is that not what you were doing yesterday?	15:56
pabelanger	no, I only tried step 1 (rereference) before I passed the torch to ianw	15:57
pabelanger	so, this is new process for me also	15:57
jeblair	pabelanger: how did you determine referencesdb was bad?	15:57
pabelanger	jeblair: I didn't, I never deleted referencedb, but just tried rereference to see if there was any corruption	15:58
pabelanger	it command worked properly	15:58
jeblair	pabelanger: thanks. i'll take it from here.	15:58
pabelanger	okay	15:59
openstackgerrit	Brian Rosmaita proposed openstack-infra/project-config master: Remove workflow +1 on glance_store from swift-core https://review.openstack.org/511517	15:59
*** andreas_s has quit IRC		16:00
*** links has joined #openstack-infra		16:01
*** ralonsoh has quit IRC		16:01
*** chlong has quit IRC		16:03
*** tikitavi has quit IRC		16:04
jeblair	#status log removed mirror.npm volume from afs	16:04
openstackstatus	jeblair: finished logging	16:04
*** ykarel is now known as ykarel\|afk		16:05
*** edmondsw has quit IRC		16:05
pabelanger	Shrews: it looks like maybe just rackspace upload have the issue in shade. I can see an inap working in debug log	16:06
*** erlon has quit IRC		16:07
smatzek	The trove gate has been broken by one thing or another since the PTG. I've been working for 2-3 weeks trying to fix it up. In the past couple days I've seen errors like this from the gate-trove-python27-ubuntu-xenial checks whereas the openstack-tox-py27 runs clean. Is this a known issue with v3? "libcurl4-gnutls-dev : Depends: libcurl3-gnutls (= 7.47.0-1ubuntu2.2) but 7.47.0-1ubuntu2.3 is to be installed"	16:08
smatzek	http://logs.openstack.org/87/507087/15/check/gate-trove-python27-ubuntu-xenial/3168c43/console.html	16:08
jeblair	smatzek: we're not running v3. it's a known issue. see topic.	16:08
Shrews	pabelanger: i think it's only providers using tasks to upload images	16:09
pabelanger	Shrews: okay, I think that is only rackspace for us	16:09
*** dangers has quit IRC		16:09
smatzek	thanks, I read the upload issue but glazed over the gnutls	16:09
*** masber has quit IRC		16:09
clarkb	fungi: poking around the tmp/ansible fix definitely seems to have cut down on tripleo inode consumption but they are still about an order of magnitude more inodes per job run than say multinode devstack + tempest	16:10
pabelanger	we're just compressing ubuntu-xenial DIB now, shouldn't be much longer before we start uploads	16:10
*** andreas_s has joined #openstack-infra		16:10
pabelanger	smatzek: hopefully not much longer, new images should be coming online in the next hour	16:10
*** dangers has joined #openstack-infra		16:10
*** camunoz has quit IRC		16:11
clarkb	and are significantly smaller under zuulv3 I think because log collection is broken there for som reason for them	16:11
fungi	clarkb: good to know. we _could_ just delete the logs for those jobs specifically. is there a solid file glob i could match on to get all those?	16:11
*** edmondsw has joined #openstack-infra		16:12
fungi	trying to do ti by arbitrary pattern matching with find is not going to be fast enough	16:12
clarkb	fungi: gate-tripleo-ci- is the job name prefix	16:12
clarkb	the ara install at top level of all zuulv3 jobs is coming in at 400-600 inodes depending on job looks like	16:13
clarkb	which may significantly bump inode overhead for all the things that weren't really copying many files before hand	16:13
Shrews	pabelanger: i am REALLY confused as to how rax has _any_ uploads	16:13
*** dhinesh has joined #openstack-infra		16:14
*** andreas_s has quit IRC		16:14
clarkb	455 inodes for nova pep8 under zuulv3, 24 under v2	16:15
jeblair	pabelanger: i have a dumb question -- is it possible for us to just copy the db files from the most recent read-only release into place, and then run reprepro normally?	16:15
clarkb	441 of the v3 side is ara	16:15
jeblair	fungi, clarkb: ^ do you know enough about reprepro to know if that's okay?	16:15
*** AJaeger has joined #openstack-infra		16:16
*** edmondsw has quit IRC		16:16
fungi	jeblair: that _seems_ like it should be okay, but i don't really know. i get the impression that's where it keeps its state anyway so makes sense that it should be able to roll forward again from there	16:16
pabelanger	jeblair: I don't see why we can't try. reprepro should be smart enough to detect differences and update where needed. reprepro check and reprepro checkpool should be how we audit	16:16
*** priteau has joined #openstack-infra		16:16
jeblair	ya -- like maybe it re-downloads some new files or something. that'd be fine.	16:16
jeblair	okay, i'll give that a shot	16:16
jeblair	and verify with pabelanger's suggested commands	16:17
fungi	pabelanger: unfortunately the checks seem to be designed to retrieve every file from the filesystem (so over slow udp datagrams in the case of afs) to recalculate checksums, right?	16:17
jeblair	fungi: well, that's been the repair process to date anyway	16:17
fungi	or does it have a check mode to just verify filenames?	16:18
* SpamapS peeking back in and seeing reprepro and inode issues... quickly retreats like a groundhog seeing his shadow		16:18
*** dangers` has joined #openstack-infra		16:19
*** AJaeger has quit IRC		16:19
*** AJaeger has joined #openstack-infra		16:19
*** dangers has quit IRC		16:20
fungi	clarkb: are there specific paths under those gate-tripleo-ci-* log trees i should remove, or are those scattered and better to just remove the entire tree for each job matching that name pattern?	16:20
SpamapS	reprepro's check process likely is also reading all of the metadata from every package.	16:20
clarkb	fungi: logs/undercloud/tmp/ansible logs/ara_oooq logs/undercloud/etc seem to be large hitters	16:21
*** AJaeger has quit IRC		16:21
*** sadasu has quit IRC		16:21
clarkb	fungi: I'm currently trying to get a count for what I hope is a representative nova change t osee if we should expect to be able to store 4 weeks of nova change logs	16:21
clarkb	change 509039 has ~157208 inodes	16:22
mordred	morning all	16:22
*** AJaeger has joined #openstack-infra		16:22
clarkb	we have inodes for about 5122 nova changes if we treat that as representative	16:22
*** camunoz has joined #openstack-infra		16:22
mordred	clarkb: holy crap! - 157208 is a lot of inodes for one change	16:22
pabelanger	fungi: yah, it was a slow process last night when I did reprepro export, that walked all the files in the pool for generating indexes	16:22
clarkb	mordred: ya I'm going to start trying to break that down	16:23
*** andreas_s has joined #openstack-infra		16:24
fungi	clarkb: thanks, i'm still waiting for ls \| wc -l to return a count for the pattern /srv/static/logs/??/??????///gate-tripleo-ci-*	16:24
fungi	well, ls -d specifically	16:24
pabelanger	clarkb: as I am watching this DIB rebuilt again, I'm noticing we're approaching 1 hour build times again. I think it is possible we might want to delete our git cache on builders for a fresh (and smaller) start shorlty	16:26
stephenfin	dhellmann: Got a few mins?	16:27
stephenfin	Curious about the rationale behind pre/post-versioning in pbr	16:27
dhellmann	stephenfin : I'm on a call. ~30 min?	16:27
* stephenfin didn't know you could do 'Sem-Ver:' trailers		16:27
stephenfin	dhellmann: I'll probably be gone by then. Tomorrow is fine :)	16:27
dhellmann	stephenfin : ok. or email to the -dev list (this sounds like something others might be interested in and have input into)	16:28
jeblair	it looks like the checksumsdb on the ro volume is corrupt, so i'm back to running the find \| _detect command	16:28
AJaeger	fungi, what about remove /srv/static/logs/0b0bbd59a9be905da869ace3797919f9cd6217/ etc - those are logs that nobdy finds...	16:28
AJaeger	these came from initial Zuul v3 logs	16:28
clarkb	mordred: check's patchset 4 of that change is 116373	16:28
clarkb	looks like ~8 rechecks	16:28
clarkb	I think there is a lot of weight behind the "constant rechecks are just making it worse" theory based on ^	16:29
jeblair	i am running that command with a copy of the db directory on local disk, so if there is any trouble writing to the fileserver, we shouldn't lose the whole operation.	16:29
mordred	clarkb: yah - that makes an amount of sense	16:29
pabelanger	clarkb: Shrews: ubuntu-xenial DIB finished, we've started uploads	16:29
jeblair	but as ianw calculated, best case for this is probably 6 hours	16:30
Shrews	pabelanger: ++	16:30
clarkb	openstack-tox-pep8 is ~1800 and old pep8 job is ~200 over those rechecks	16:30
clarkb	maybe we sould consider not building a static ara for every build? we could maybe just upload them on failures?	16:30
*** jkilpatr_ has joined #openstack-infra		16:31
*** jpich has quit IRC		16:31
fungi	AJaeger: sure, i can do that and maybe it'll free up some as well	16:31
pabelanger	clarkb: wow, large difference	16:32
*** jkilpatr has quit IRC		16:32
pabelanger	clarkb: sounds like we need to update ARA regardless. But I do only look at it today if there is a failure	16:32
*** andreas_s has quit IRC		16:33
*** trown is now known as trown\|lunch		16:33
clarkb	pabelanger: ya me too, which is why I had that idea:) it is really handy for understanding failures, but skimming successes tends to happen in the job output log for me (or job specific logs)	16:33
pabelanger	clarkb: maybe we can see how much effort would be involved from dmsimard to clean up little files	16:34
Shrews	pabelanger: I'm actually unclear as to why those are already uploading since 1141 is less than 24 hours old	16:34
*** ykarel\|afk has quit IRC		16:34
*** sambetts is now known as sambetts\|afk		16:34
pabelanger	Shrews: not sure myself	16:34
SpamapS	is there a summary of why some jobs take up so many inodes? Purely curious.	16:35
*** Apoorva has joined #openstack-infra		16:35
*** Guest66337 has quit IRC		16:35
clarkb	SpamapS: in the general case, ara seems to be a big hitter. In specific cases some tripleo jobs were copying all of ansibles /tmp contents	16:36
*** hashar is now known as hasharAway		16:36
clarkb	SpamapS: there are also bits of some jobs like tripleo that copy a good chunk of /etc which if grabbing /etc/selinux gets about an aras worth of files too	16:36
*** vsaienk0 has quit IRC		16:36
pabelanger	2017-10-12 16:36:50,057 INFO nodepool.builder.UploadWorker.0: Image build ubuntu-xenial-0000001155 in ovh-bhs1 is ready	16:37
pabelanger	OVH nice and fast :)	16:37
clarkb	oh also tripleo has its internal ara_oooq which is much larger than the zuul ara	16:37
jeblair	clarkb: ara is optional in zuulv3; the process to disable it is just to uninstall it and restart the executors.	16:38
pabelanger	clarkb: we could propose ara_oooq disable now, as zuulv3 wouldn't make it needed?	16:38
Shrews	pabelanger: oh, nm. uploads happen anytime. dib rebuilds happen only after 24 hours	16:38
SpamapS	clarkb: if it's just for debugging and not quick viewing... tar instead of rsync?	16:38
jeblair	SpamapS: that would likely be so difficult to use to not make it worthwhile; i would find it easier to just read the raw json.	16:39
pabelanger	SpamapS: clarkb: I think stackviz has the right idea of what it does, a single json file IIRC, to render data from	16:39
boden	hi, as of recent I’ve seen a number of failures in the v2 jobs “ERROR: These requested packages were not installed..” is this a known issue?	16:39
pabelanger	boden: it is known, we are pushing up new images now to try and fix it	16:39
jeblair	i think i can channel dmsimard here and say the right way to use ara in this situation is with a centralized reporting database.	16:39
dmsimard	I don't believe there is a short term opportunity to reduce the amount of files generated in an ARA report, it's basically one html file per result/host/etc. The best option is to consider a centralized instance of sorts (not unlike openstack-health)	16:40
fungi	SpamapS: if i were to redesign this entire system, i'm starting to think that it would have made more sense to archive a tarball of logs from each build and then have a human-friendly frontend which temporarily unpacks and serves it on demand... but then you also get the benefits that anyone or any system who wants to grab all the logs for a build can just pull that tarball	16:40
dmsimard	jeblair: thou hath summoned me	16:40
AJaeger	we also store both job-output.json.gz and job-output.txt.gz - and the json is 3 times as large as the txt	16:40
jeblair	dmsimard: nailed it!	16:40
boden	pabelanger: ack	16:40
boden	thanks	16:40
pabelanger	jeblair: dmsimard: woah	16:40
jeblair	AJaeger: the json has all the information in it; we still haven't gotten the text quite right yet. it very frequently does not have info wee need to diagnose errors.	16:41
SpamapS	jeblair: I was just thinking for the instances where people try to grab all of /etc, not ARA.	16:41
AJaeger	jeblair: ack	16:41
jeblair	SpamapS: ah	16:41
dmsimard	The thing about a centralized instance is that we don't want every ansible run to synchronously report each task/result to a central (mysql) database remotely, just the added latency I suspect would be noticeable -- especially for regions farther away from the database server.	16:42
dmsimard	We would need to asynchronously import the data, somehow.	16:42
clarkb	SpamapS: well we've also asked htat that stop happening and it has improved over time, but still finding cases here and there where I think it must be a blacklist instead of a whitelist of copies	16:42
dmsimard	A quick hack would likely be to recover the sqlite database and import it in a central location a bit like we trigger logstash things	16:43
openstackgerrit	wes hayutin proposed openstack-infra/tripleo-ci master: be more prescriptive in log collection https://review.openstack.org/511526	16:43
SpamapS	clarkb: maybe we should du -i in the executor too.	16:43
SpamapS	actually	16:43
SpamapS	s/maybe/	16:43
SpamapS	/	16:43
SpamapS	bah	16:43
dmsimard	So we wouldn't generate the report on each job but we would copy the sqlite database. The database is pretty small.	16:43
SpamapS	DiskAccountant should have stopped this.	16:43
SpamapS	We're limiting on storage bytes, but we should also limit inodes.	16:44
jeblair	SpamapS: our limit is super high until after the cutover because we have to support some legacy jobs that use a lot of space	16:44
jeblair	SpamapS: also, we still haven't cut over	16:44
SpamapS	this happened on log storage from check jobs yeah?	16:44
dmsimard	I'd need to think about the process involved in importing databases over and over.	16:44
SpamapS	And just the duplicated check load?	16:44
jeblair	so i'd like to suggest that we focus this conversation right now to whether we need to make any emergency changes to zuulv3 right now to stop our inode use?	16:44
jeblair	because the ci system has had at least a partial outage for over a day	16:45
jeblair	and we should focus on nothing other than correcting that now.	16:45
jeblair	when we clear that status alert, we can talk about what to do later in v3	16:45
clarkb	jeblair: I think it is definitely a large overhead for the previously "small" jobs	16:45
*** yamamoto has joined #openstack-infra		16:45
clarkb	er ara by default for each job is	16:45
dmsimard	ara is probably a nice to have, if it can help alleviate the load we can toggle it off -- no need to do it by uninstalling ara from the executor IMO, we could just disable the generation from inside the role that does the generation	16:46
pabelanger	If ARA is a large amount of inodes over v2, then (reluctantly) I'd be in favor of disabling it on zuulv3 for now	16:46
clarkb	I'd be happy trying it with just failed jobs to start if that is easy	16:46
SpamapS	Right, I was suggesting we have DiskAccountant kill jobs that abuse the inode table of the executor. But I guess the problem is actually not the executor running out of inodes, but the end target running out.	16:46
jeblair	how about we just turn off the v3 check pipeline right now?	16:46
*** armaan has joined #openstack-infra		16:46
clarkb	that works too	16:46
*** andreas_s has joined #openstack-infra		16:46
pabelanger	sure	16:46
fungi	that may help drop some load. even with `rm -rf /srv/static/logs/??/??????///gate-tripleo-ci-*/logs/{und	16:46
*** smatzek has quit IRC		16:46
jeblair	mordred: are you available?	16:46
fungi	ercloud/tmp/ansible,ara_oooq,undercloud/etc}` going we're still not gaining ground	16:46
jeblair	guess not	16:47
*** smatzek has joined #openstack-infra		16:47
jeblair	clarkb: you want to disable v3 check?	16:47
SpamapS	Yeah I think that's the thing to do.	16:47
dmsimard	I'll go ahead and propose a toggle to disable the ara report generation just in case, it could come handy in the future	16:47
clarkb	jeblair: ya I think we should to try and get logs fs into a happier state	16:48
fungi	basically, i think short of burning down whole swaths of the logs tree, i don't think we can delete files faster than we're uploading them at the moment	16:48
jeblair	clarkb: sorry, i meant, are you available to make that change?	16:48
clarkb	oh yes, I can	16:48
*** dimak has quit IRC		16:48
jeblair	clarkb: cool, it's yours	16:48
clarkb	I'll push that up momentarily	16:48
fungi	i suppose another option is we could artificially constrain our nodepool quota so we run fewer jobs at a time	16:49
fungi	i mean, after the stop the v3 check pipeline	16:49
*** leyal has quit IRC		16:49
*** lihi has quit IRC		16:50
*** oanson has quit IRC		16:50
*** oanson has joined #openstack-infra		16:50
fungi	but let's see if this makes a significant dent first, i guess	16:51
*** smatzek has quit IRC		16:51
openstackgerrit	Clark Boylan proposed openstack-infra/project-config master: Disable zuulv3 check pipeline https://review.openstack.org/511527	16:51
clarkb	does that look right?	16:51
pabelanger	looking	16:52
*** yamamoto has quit IRC		16:52
*** lucasagomes is now known as lucas-afk		16:52
*** Swami has quit IRC		16:53
mordred	jeblair: yes! I am here	16:53
pabelanger	yah, think so	16:53
mordred	+2 on turning of v3 check	16:54
*** dimak has joined #openstack-infra		16:54
*** leyal has joined #openstack-infra		16:54
*** lihi has joined #openstack-infra		16:55
jeblair	clarkb: zuul reported back with an expected post-fail. that means the syntax check passed. i say force-merge it now.	16:56
mordred	++	16:56
clarkb	ok on it	16:56
*** smatzek has joined #openstack-infra		16:57
pabelanger	periodic is also large on zuulv3 (303 patches), so zuulv3 might start processing them with no check	16:57
clarkb	do we want to disable periodic too?	16:57
jeblair	good point. i'm in favor of disabling periodic	16:58
openstackgerrit	Merged openstack-infra/project-config master: Disable zuulv3 check pipeline https://review.openstack.org/511527	16:58
clarkb	ok working on a patch for periodic now	16:58
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Add a toggle to disable ARA static report generation https://review.openstack.org/511528	16:58
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Add a toggle to enable saving the ARA sqlite database https://review.openstack.org/511529	16:58
dmsimard	infra-root ^	16:58
*** derekh has quit IRC		16:58
*** andreas_s has quit IRC		17:00
openstackgerrit	Clark Boylan proposed openstack-infra/project-config master: Similarly to disabling check, disable periodic https://review.openstack.org/511530	17:00
clarkb	there is periodic	17:00
*** Goneri has joined #openstack-infra		17:01
*** baoli has quit IRC		17:01
jeblair	clarkb: i think you need to leave the pipelines and change the trigger to "trigger: {}"	17:02
clarkb	thanks	17:02
clarkb	(just saw the error with jobs trying to use a pipeline that no longer exists)	17:02
openstackgerrit	Clark Boylan proposed openstack-infra/project-config master: Similarly to disabling check, disable periodic https://review.openstack.org/511530	17:03
*** panda\|rover is now known as panda\|rover\|off		17:04
mordred	dmsimard: both lgtm	17:04
*** caphrim007 has joined #openstack-infra		17:05
*** eroux has joined #openstack-infra		17:06
*** priteau has quit IRC		17:06
*** baoli has joined #openstack-infra		17:07
*** jbadiapa has quit IRC		17:07
*** baoli has quit IRC		17:09
*** baoli has joined #openstack-infra		17:10
*** dangers` has quit IRC		17:10
*** iyamahat has quit IRC		17:10
*** iyamahat has joined #openstack-infra		17:10
clarkb	511530 has no post failured, ready for me to merge it ?	17:10
pabelanger	++	17:11
*** baoli has quit IRC		17:11
*** tesseract has quit IRC		17:11
*** baoli has joined #openstack-infra		17:11
mordred	clarkb: wfm	17:12
openstackgerrit	Merged openstack-infra/project-config master: Similarly to disabling check, disable periodic https://review.openstack.org/511530	17:12
clarkb	mordred: want me to remove you from project bootstrappers when I remove myself?	17:12
*** sree has joined #openstack-infra		17:12
*** dangers has joined #openstack-infra		17:13
*** caphrim007_ has joined #openstack-infra		17:13
pabelanger	just citycloud-sto2 and infracloud (both regions) left for latest ubuntu-xenial DIB	17:13
mordred	clarkb: yes please	17:13
pabelanger	we should be seeing some results of bindep-fallback.txt already	17:13
clarkb	mordred: done	17:13
pabelanger	going to try and find a log	17:13
*** ociuhandu has quit IRC		17:14
pabelanger	boden: which review did you see the failure on?	17:15
openstackgerrit	Monty Taylor proposed openstack-infra/shade master: Temporarily disable volume and os_image functional tests https://review.openstack.org/508156	17:15
openstackgerrit	Monty Taylor proposed openstack-infra/shade master: Fix image task uploads https://review.openstack.org/511532	17:15
mordred	Shrews: ^^ those should help/fix the image upload issue	17:15
pabelanger	boden: okay, I found 510224	17:16
boden	pabelanger anything recent in neutron-lib… for example https://review.openstack.org/#/c/502416/ https://review.openstack.org/#/c/510224/	17:16
*** caphrim007 has quit IRC		17:16
pabelanger	boden: thanks	17:16
mordred	Shrews: or at least help with the log spam - since I think the uploads are actually accidentally occuring correctly whilst we log a bunch of errors - but logging errors while a thing works cloud-side means finding real errors is unpossible	17:17
clarkb	df shows IFree appears to be slowly increasing	17:17
clarkb	course now that i have said that...	17:17
*** sree has quit IRC		17:17
clarkb	also would be nice if du had -i on that server, oh well	17:18
mordred	clarkb: yah- it rises for a bit and then gets cratered	17:18
*** dhinesh has quit IRC		17:18
boden	pabelanger: also still seeing issues with vmware-nsx… ex: https://review.openstack.org/#/c/509661/	17:19
clarkb	mordred: its over 100k now at least	17:19
mordred	clarkb: \o/	17:19
pabelanger	boden: okay, just confirming it is fixed with 510224	17:19
openstackgerrit	Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Add override-branch to all periodic jobs https://review.openstack.org/511533	17:19
pabelanger	boden: I would also suggest adding your own bindep.txt to both project, and figuring out which OS packages you need bindep to install. Possible you might be able to mitigate the errors from today, since you are using bindep-fallback.txt right now	17:20
boden	pabelanger: ok, I’ll have to read up on that	17:21
clarkb	wow and back down to 80k ish	17:21
AJaeger	we also had broken periodic jobs in v3, see 511533 - we didn't specify override-branch and just run the job for each branch...	17:21
pabelanger	will ned to recheck 510224, all jobs ran on rackspace	17:22
*** slaweq_ has joined #openstack-infra		17:23
inc0	good morning, zuulv3 is down?	17:23
pabelanger	yes, we are stopping check pipelines until we can recover logs.o.o	17:23
pabelanger	(on zuulv3)	17:23
inc0	ok	17:23
pabelanger	clarkb: should we restart zuulv3 to dump pipelnes? Or let it run out	17:23
clarkb	oh right it won't dump them on its own	17:24
clarkb	jeblair: ^ what do you thinK/	17:24
AJaeger	team, we have 303 changes currently in periodic and 117 in check	17:24
jeblair	clarkb: yes we should	17:24
AJaeger	pabelanger: yes!	17:24
*** slaweq_ has quit IRC		17:24
jeblair	i will do it	17:24
clarkb	jeblair: tanks	17:24
pabelanger	++	17:24
clarkb	and now down toe 45k ish inodes	17:24
clarkb	so ya not keeping up	17:25
pabelanger	Shrews: just citycloud-sto2 left for ubuntu-xenial DIB	17:25
pabelanger	and rackspace of course	17:25
jeblair	zuulv3 restarting	17:25
*** felipemonteiro has joined #openstack-infra		17:26
*** links has quit IRC		17:27
clarkb	we are back down to 0 free inodes	17:27
pabelanger	okay, confirmed gnutls package is no longer breaking on xenial with ovh	17:27
pabelanger	telnet://158.69.88.129:19885	17:27
pabelanger	clarkb: care to +3 511492	17:28
*** shardy has quit IRC		17:29
pabelanger	that's what we used for ubuntu DIBs	17:29
jeblair	as best as i can tell, the checksum correction will take more than 7 more hours.	17:29
pabelanger	kk	17:29
clarkb	pabelanger: done	17:30
fungi	pabelanger: excellent job!	17:30
clarkb	watching df -i output I am imagining a game of hungry hungry hippos	17:30
fungi	i take it we're still waiting on uploads to complete elsewhere	17:30
fungi	clarkb: yes, that's a great analogy	17:31
jlk	AJaeger: reviewed. I don't think anything else needs root.	17:31
*** pblaho has quit IRC		17:31
pabelanger	Yah, as long as we don't vos release our ubuntu AFS mirror, we should be protected until we can repair read/write. Just means we've pinned for abit	17:31
AJaeger	jlk: great, thanks	17:31
fungi	clarkb: heh, one of my polls (i have watch reporting once a minute on inode count for that filesystem) showed 1 free inode	17:32
pabelanger	we also could cause ubuntu DIBs too, if we felt the need	17:32
pabelanger	s/cause/pause	17:32
jlk	AJaeger: You'll need infra root to push it through though.	17:32
*** sree has joined #openstack-infra		17:32
AJaeger	jlk: why that?	17:32
clarkb	is there an easy way t ostrace ssh on static.o.o and filter writes to logs/ ?	17:32
clarkb	and maybe we can see realtime what is going into the fs/	17:33
jlk	or maybe not infra root, but people with more voting rights than I have :D	17:33
mordred	jlk, AJaeger: which change?	17:33
clarkb	clarkb@static:/srv/static/logs/61$ find 509761 \| wc -l returns 2840985	17:33
fungi	clarkb: sshd forks on each incoming connection, so you'd need to -f	17:33
AJaeger	jlk: yes, I know...	17:33
pabelanger	clarkb: Oh, I guess we should also get check-tripleo pipelines for zuulv3	17:33
pabelanger	since they will run tripleo jobs	17:33
mordred	clarkb: holy crap	17:34
jeblair	pabelanger: please update commit message on https://review.openstack.org/473911 so we can merge the change. i've already manually removed the crontab entries.	17:34
AJaeger	mordred: https://review.openstack.org/511396 , https://review.openstack.org/511436 , https://review.openstack.org/511435	17:34
pabelanger	jeblair: ack	17:34
clarkb	a single tripleo change is .4% of our inode total	17:34
SamYaple	haha wow	17:35
*** sshnaidm is now known as sshnaidm\|off		17:35
AJaeger	jeblair: let me do a proposal for periodic translation jobs...	17:36
jeblair	AJaeger: let's check with mordred and see which approach he favors; he's thought about this more in the context of the migrated jobs	17:36
*** ociuhandu has joined #openstack-infra		17:36
clarkb	I need to pop out for breakfast now. If someone else is able to get check-trupleo in v3 that would be great	17:36
jeblair	clarkb: can you elaborate? i don't know what you're asking	17:37
mordred	AJaeger, jeblair: those patches look good to me - what's the other approach?	17:37
openstackgerrit	Paul Belanger proposed openstack-infra/system-config master: Remove npm / rubygem crontab entries https://review.openstack.org/473911	17:37
pabelanger	jeblair: fungi: ^more info in commit message for rubygems / npm	17:38
jeblair	mordred: https://review.openstack.org/511533 i think is the change we're discussing with AJaeger	17:38
AJaeger	mordred, jeblair what about http://paste.openstack.org/show/623489/ instead of https://review.openstack.org/#/c/511436/1/zuul.d/jobs.yaml ?	17:38
jeblair	i've re-enabled all of the crontab entries on mirror-update except ubuntu	17:38
AJaeger	jeblair: problem with that approach is that we have some repos that don't run the job on all branches	17:38
*** felipemonteiro has quit IRC		17:39
jeblair	AJaeger: you're saying some projects only run propose-translation-update on master, but some run on all branches? in which case, put no branch matcher on the job, but do add one to the project-pipeline invocation of the job.	17:42
jeblair	AJaeger: also, branches can be a yaml list, so you don't have to do regexes any more	17:42
*** SumitNaiksatam has joined #openstack-infra		17:43
AJaeger	jeblair: most projects run the translation proposal only on master, some on stable/pike and ocata, some only on pike, others only on ocata	17:43
clarkb	jeblair: pabelanger pointed out we are still running check tripleo in zuulv3	17:43
pabelanger	clarkb: mordred: what if you did http://logs.openstack.org/07/472607/ ? what is the number of inodes on that?	17:43
AJaeger	all depending on whether translations were ready at that time.	17:43
jeblair	this is certainly how new periodoc jobs should be constructed. the question i'd like mordred to weigh in on is whether we should go ahead and do this for these broken legacy jobs, or is the other approach better.	17:44
*** trown\|lunch is now known as trown		17:44
jeblair	mordred you +2d 511533 without any feedback on my comments, so i guess that means he favors your approach. that's fine.	17:44
mordred	wait	17:44
jeblair	wow that didn't make sens	17:44
jeblair	mordred you +2d 511533 without any feedback on my comments, so i guess that means you favor AJaeger'sapproach.	17:45
jeblair	i did not switch conversation partners well	17:45
jeblair	anyway, i need to focus on fires	17:45
mordred	jeblair, AJaeger: I'm sorry, I feel completely lost. I do not understand how branch matchers can have any impact on periodic jobs	17:45
jeblair	so let's pin this for later.	17:45
jeblair	clarkb: i'll take care of it	17:45
*** apetrich has quit IRC		17:45
mordred	ok. I've removed my vote (which I gave missing jeblair's comments) - and yes, I'd like to pin it for later bcause I'm not being a good participant in it right now it seems	17:46
jeblair	we need to focus on the 30-hour long ci outage now.	17:46
*** armaan_ has joined #openstack-infra		17:46
AJaeger	jeblair: ok - we can discuss later, I'll add to etherpad	17:46
mordred	yup	17:46
*** edmondsw has joined #openstack-infra		17:46
pabelanger	okay, gnutls is under control now. I've see a few projects working properly again with bindep-fallback.txt	17:48
*** apetrich has joined #openstack-infra		17:48
pabelanger	what can I help with now?	17:48
openstackgerrit	James E. Blair proposed openstack-infra/project-config master: Disable trigger for v3 check-tripleo pipeline https://review.openstack.org/511540	17:48
jeblair	pabelanger: did you end up creating and uploading new images?	17:49
*** Swami has joined #openstack-infra		17:49
pabelanger	jeblair: yes, we used AFS mirrors for packages on xenial DIB, once we built and images uploaded, issue resolved it self	17:49
*** armaan has quit IRC		17:49
*** dhinesh has joined #openstack-infra		17:49
jeblair	pabelanger: and theyre uploaded to all regions?	17:49
pabelanger	jeblair: execpt rackspace, because of a shade bug	17:50
jeblair	ok. and that's fine because they have working old images	17:50
pabelanger	and rackspace is running 3 day old images, that are not affected	17:50
pabelanger	yah	17:50
*** sree has quit IRC		17:50
Shrews	They should eventually complete in rax	17:50
*** rwsu has quit IRC		17:51
pabelanger	once https://review.openstack.org/511492/ lands, we also can remove nb04 from emergency file	17:51
pabelanger	I recommend we keep nb03 disabled, and potentially rebuilt into rackspace for faster uploads of DIBs	17:51
*** sree has joined #openstack-infra		17:51
openstackgerrit	Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add new translation templates https://review.openstack.org/511541	17:52
mordred	the longest portion of rackspace 'upload' time is actually the image import step I believe (which is where the bug is)	17:52
jeblair	pabelanger, mordred: +3 https://review.openstack.org/511540 ?	17:52
jeblair	fungi: do you need any help with logs?	17:53
mordred	jeblair: done	17:53
pabelanger	should I look at update base jobs in zuulv3 with ARA disabled?	17:53
jeblair	pabelanger: no	17:53
pabelanger	kk	17:53
jeblair	i'm not ready to talk about anything that isn't directly related to clearing that status message	17:54
*** dhinesh has quit IRC		17:54
*** dhinesh has joined #openstack-infra		17:54
*** armaan_ has quit IRC		17:54
fungi	jeblair: i think i've got it deleting the most effective two things we can hope for at the moment (subtrees of tripleo jobs clarkb identified, and any logs older than 2 weeks). it's _almost_ keeping up now, and as we finish winding down the other high-use pipelines on zuulv3 i have hopes it'll finally gain ground	17:54
*** armaan has joined #openstack-infra		17:55
*** slaweq_ has joined #openstack-infra		17:55
openstackgerrit	Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Add new translation templates https://review.openstack.org/511541	17:55
fungi	i haven't seen it break 100k free inodes yet (polling once a minute) but it's been a little while since i've seen it at 0 free (i've seen a few sub-1k though)	17:56
*** sree has quit IRC		17:56
*** baoli has quit IRC		17:56
fungi	as opposed to earlier where it was basically pegged to 0 free on every poll	17:56
fungi	no, wait, there it just came back 0 again	17:56
SamYaple	fungi: you jinxed it!	17:56
fungi	indeed :/	17:56
pabelanger	okay, I'll work on getting system-config jobs working	17:56
*** baoli has joined #openstack-infra		17:57
pabelanger	looks like openstackci-beaker is failing	17:57
jeblair	pabelanger: thanks	17:57
*** jascott1 has joined #openstack-infra		17:57
mordred	as soon as the latest patch from jeblair lands we should likely restart zuul again to clear that pipeline yeah?	17:57
jeblair	mordred: there's only one thing in it now, so if it lands soon, no big deal	17:57
fungi	probably so	17:57
mordred	oh - ok. cool	17:57
fungi	oh, good point we already restarted so that cleared it out anyway	17:58
*** camunoz has quit IRC		17:58
fungi	and there hasn't been a lot of time for it to accumulate again	17:58
*** armaan has quit IRC		17:59
*** trown is now known as trown\|brb		18:00
mordred	Shrews: https://review.openstack.org/#/c/508156/ came back green - wanna +A it?	18:01
mordred	Shrews: the follow up also is mostly green - the red is POST_FAILURE from the current incident	18:01
*** dangers has quit IRC		18:02
*** eharney has quit IRC		18:03
openstackgerrit	Paul Belanger proposed openstack-infra/puppet-openstack_infra_spec_helper master: Cap signet < 0.8.0 https://review.openstack.org/511543	18:03
fungi	huh, no more rsync processes on static.o.o for the past few minutes, and now we're up over 100k free inodes	18:04
fungi	over 200k free	18:05
*** dprince has quit IRC		18:05
mordred	fungi: I haven't seen that number over 200k in a WHILE	18:07
pabelanger	jeblair: okay, I've confirmed ubuntu-trusty also is affect with gnutls issue. I've started an image-build for ubuntu-trusty now	18:07
*** dangers has joined #openstack-infra		18:08
*** ijw has joined #openstack-infra		18:08
*** dhinesh has quit IRC		18:08
*** dhinesh has joined #openstack-infra		18:09
*** trown\|brb is now known as trown		18:09
jeblair	i'm looking into what's holding up 511396	18:09
*** dhinesh has quit IRC		18:09
pabelanger	Yah, was just taking a peek myself	18:10
pabelanger	we have a lot of ready nodes on nl01, and few currently building	18:10
jeblair	slow building node in inap-mtl01	18:11
pabelanger	mordred: fungi: clarkb: https://review.openstack.org/511543/ is the fix for system-config openstackci-beaker jobs, if you'd like to review. I'm working on fixing ubuntu-trusty issue now	18:11
pabelanger	jeblair: maybe trying to boot new xenial image	18:11
fungi	thanks pabelanger!	18:11
jeblair	pabelanger: does that take 30m?	18:12
pabelanger	jeblair: yah, I've seen upwards of an hour	18:12
mgagne	jeblair: have new images been uploaded? is Nodepool playing catch up since yesterday?	18:12
jeblair	we should maybe set the launch timeout to 10m there like rax	18:12
fungi	i suppose it could if there's a thundering herd on the storage distribution network warming nova image caches	18:12
pabelanger	jeblair: +1	18:13
jeblair	mgagne: yes new images	18:13
pabelanger	inap boots fast, so 10mins should be plenty	18:13
*** camunoz has joined #openstack-infra		18:14
Shrews	mordred: +2's the first, the follow up has a comment from mye	18:14
Shrews	me	18:14
openstackgerrit	James E. Blair proposed openstack-infra/project-config master: Set v3 nodepool inap timeout to 600 https://review.openstack.org/511545	18:14
mordred	Shrews: thanks	18:14
pabelanger	+2	18:15
jeblair	okay, that should eventually clear. i don't think disabling v3 check-tripleo is urgent enough to do anything other than just check back in a bit.	18:15
pabelanger	great	18:16
jeblair	fungi: how much headroom do you think we need before we can send an all-clear?	18:16
fungi	watching what jobs are uploading logs in real-time (by grepping the process list for rsync) i just saw a gate-tripleo-ci-centos-7-3nodes-multinode-nv build suck up 10k inodes	18:16
Shrews	mordred: oh, maybe what you've done in that exception format will work (assumes 'message' is an attribute, right?)	18:17
fungi	jeblair: a sane amount would be when we get down to 99% inode consumption maybe? like around 7.7m free	18:17
fungi	jeblair: right now we're at 0.02% free	18:18
*** jdandrea has quit IRC		18:18
openstackgerrit	Paul Belanger proposed openstack-infra/puppet-openstack_infra_spec_helper master: Add bindep.txt file https://review.openstack.org/511546	18:19
mordred	Shrews: yes - just replied with that - status is a dict with message as a key - but we could change it to the other syntax if you prefer	18:19
Shrews	mordred: yeah, i'd rather be explicit than clever :)	18:19
openstackgerrit	Merged openstack-infra/project-config master: Install zanata dependencies as root https://review.openstack.org/511396	18:20
openstackgerrit	Merged openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace https://review.openstack.org/511492	18:20
fungi	a gate-tripleo-ci-centos-7-containers-multinode build just now uploaded 8k files	18:20
clarkb	almost 2k of that is various ara things iirc and etc is another 1k or so then multiple etc by number of nodes	18:21
fungi	and right after that i saw a gate-tempest-dsvm-py35-ubuntu-xenial job upload 600 files	18:21
fungi	so we're talking over order of magnitude higher inode counts from tripleo jobs than devstack-gate jobs	18:22
pabelanger	yah	18:22
fungi	i saw a gate-tempest-dsvm-ironic-ipa-partition-bios-pxe_ipmitool-coreos-src-ubuntu-xenial build upload 500 files	18:23
fungi	(that's one heck of a job name!)	18:23
clarkb	ya multinode tempest/grenade is in the 1k range	18:24
mordred	Shrews: kk. update coming	18:25
*** anupn has quit IRC		18:26
*** baoli_ has joined #openstack-infra		18:31
*** dhinesh has joined #openstack-infra		18:31
pabelanger	mordred: clarkb: care to +3 https://review.openstack.org/511545/ for inap launch-timeout 600	18:31
openstackgerrit	Merged openstack-infra/project-config master: Disable trigger for v3 check-tripleo pipeline https://review.openstack.org/511540	18:32
mordred	pabelanger: done	18:32
*** baoli has quit IRC		18:33
fungi	i just saw a legacy-tempest-dsvm-neutron-ovsfw build upload logs... i guess the orphaned nodes from the zuulv3 restart are still chugging along	18:34
fungi	that might explain why cleanup hasn't sped up just yet	18:34
openstackgerrit	Merged openstack-infra/project-config master: Set v3 nodepool inap timeout to 600 https://review.openstack.org/511545	18:36
jeblair	fungi: ah, yeah, we may still have the bug where a scheduler restart doesn't abort executor jobs	18:36
jeblair	though it should cause them to get deleted.	18:36
jeblair	the nodes i mean	18:37
pabelanger	ubuntu-trusty DIB now compressing	18:37
openstackgerrit	Monty Taylor proposed openstack-infra/shade master: Fix image task uploads https://review.openstack.org/511532	18:38
openstackgerrit	Monty Taylor proposed openstack-infra/shade master: Add group parameter to create_server https://review.openstack.org/511305	18:38
jeblair	fungi: at this point you should see no more legacy- jobs upload	18:38
mordred	Shrews: k. that should fix your comment	18:38
jeblair	fungi: the only nodepool v3 nodes in use are for infra-post jobs	18:38
jeblair	fungi: and the change to disable check-tripleo in v3 has landed	18:38
jeblair	i'm going to afk for about an hour for lunch, etc.	18:39
mordred	jeblair: have good lunching	18:39
clarkb	what time did the tripleo ansible tmp fix get in yesterday? mordred do you recall?	18:40
mordred	clarkb: I do not - I can go look though	18:40
mordred	clarkb: I made the Stop collecting ephemeral temp dirs patch at around 22:16 - which is right around the time we force-merged the other tmp patches	18:41
openstackgerrit	Monty Taylor proposed openstack-infra/shade master: Image should be optional https://review.openstack.org/511299	18:42
openstackgerrit	Monty Taylor proposed openstack-infra/shade master: Add method to set bootable flag on volumes https://review.openstack.org/502479	18:42
openstackgerrit	Monty Taylor proposed openstack-infra/shade master: Allow domain_id for roles https://review.openstack.org/496992	18:42
openstackgerrit	Monty Taylor proposed openstack-infra/shade master: Move role normalization to normalize.py https://review.openstack.org/500170	18:42
clarkb	mordred: thanks	18:43
mordred	Shrews: ^^ if you get a sec, those 4 have been on hold due to other gate issue, but would be nice to have if we're gonna cut a new release for the upload bug	18:43
mordred	Shrews: (3 of them are for fixing bugs humans have reported running in to)	18:43
clarkb	gate-tripleo-ci-centos-7-ovb-ha-oooq has added more /etc collection in the last 7 dayrs or so	18:43
clarkb	overcloud-*/etc seems to be the bulk of it that is new	18:44
*** vsaienk0 has joined #openstack-infra		18:44
clarkb	went from ~4k to ~37k	18:45
clarkb	23k or so of that is the ansible tmp stuff	18:45
clarkb	then good chunk of the rest looks like etc	18:45
Shrews	mordred: ack	18:46
clarkb	also /var/log/extra and /var/log/config-data	18:48
clarkb	we are copying all of the apache modules multiple times (basically once per oenstack service?)	18:49
clarkb	EmilienM: Hlogs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/nova/etc/httpd/conf.modules.d is probably as easy sort of thing to just stop collecting	18:50
clarkb	EmilienM: are we using a whitelist for logs yet?	18:50
clarkb	but its in logs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/heat_api/etc/httpd/conf.modules.d as well and so on	18:51
openstackgerrit	Merged openstack-infra/shade master: Temporarily disable volume and os_image functional tests https://review.openstack.org/508156	18:52
clarkb	logs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/keystone/etc/httpd/conf.modules.d	18:53
*** vsaienk0 has quit IRC		18:54
clarkb	looks like we also copy the system systemd units	18:55
EmilienM	o/	18:55
EmilienM	clarkb: yes we have whitelist and exclude	18:55
clarkb	EmilienM: why are we copying all of the apache modules multiple times then?	18:55
EmilienM	weshay\|ruck: can you take a look please? I'm in a call right now	18:55
clarkb	and systemd system units?	18:55
EmilienM	clarkb: I don't know now	18:55
weshay\|ruck	aye	18:56
weshay\|ruck	EmilienM, k	18:56
EmilienM	ty	18:56
weshay\|ruck	clarkb, I have an email out for review w/ a few patches for logs on openstack-dev	18:57
pabelanger	likey drop SSH host keys in http://logs.openstack.org/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/etc/ssh/	18:57
pabelanger	:( http://logs.openstack.org/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/etc/sysconfig/network-scripts/	18:58
pabelanger	don't think we need all of sysconfig/network-scripts too	18:58
*** camunoz has quit IRC		18:58
pabelanger	weshay\|ruck: clarkb: lets create a topic in gerrit so we can review them	18:59
weshay\|ruck	https://github.com/openstack-infra/tripleo-ci/blob/master/toci-quickstart/config/collect-logs.yml#L151	18:59
weshay\|ruck	ya.. I'll nuke that	18:59
*** slaweq_ has quit IRC		18:59
pabelanger	Yah, I think we should be whitelisting specific files, not just directories	18:59
clarkb	logs/61/509761/2/check/gate-tripleo-ci-centos-7-containers-multinode/99f9196/logs/subnode-2/etc/selinux is another big consumer	19:00
clarkb	pabelanger: yes that is what we've been asking for since like march	19:00
pabelanger	clarkb: agree	19:00
pabelanger	okay, I have to run out for an errand	19:00
pabelanger	I will try to be back shortly	19:00
fungi	worth noting, tripleo isn't the only team with high-inode-count build logs... i just saw a gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial build upload 5k files	19:02
Shrews	mordred: reviewed the shade changes. all look good except for one	19:03
*** ihrachys_ has joined #openstack-infra		19:03
*** harlowja has quit IRC		19:03
*** slaweq_ has joined #openstack-infra		19:03
clarkb	fungi: do you have the full path to that? i'd be curious to go see what tehy are grabbing	19:03
fungi	clarkb: /srv/static/logs/44/479844/6/check/gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial/1541ded/	19:03
clarkb	thanks	19:03
dmsimard	fungi: OSA have heavy playbooks and use ARA so there's likely a lot of files	19:03
*** rbrndt has quit IRC		19:03
dmsimard	(because of ARA)	19:03
*** ihrachys has quit IRC		19:03
fungi	noted	19:03
Shrews	mordred: i think you need to s/payload/kwargs/ in 511305 ?	19:04
clarkb	fungi: ara is 2800 of that	19:04
fungi	yikes. still a lot of files, but ara is over 50%?	19:04
dmsimard	fungi: that's probably not even the heaviest one, gate-openstack-ansible-openstack-ansible-aio-ubuntu-trusty is likely heavier than that	19:05
clarkb	they are also grabbing a lot of stuff out of etc that shouldn't be grabbed	19:05
*** lukebrowning has quit IRC		19:05
clarkb	and looks like redundant sets possibly	19:05
dmsimard	fungi: wait, wrong job name, hang on.	19:05
*** masber has joined #openstack-infra		19:05
dmsimard	fungi: gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial http://logs.openstack.org/21/474721/7/check/gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial/d8cdf1d/logs/ara/	19:06
*** slaweq_ has quit IRC		19:06
*** baoli_ has quit IRC		19:06
*** slaweq_ has joined #openstack-infra		19:06
dmsimard	that one likely has a bunch of files :(	19:07
fungi	dmsimard: oh, yeah, /srv/static/logs/21/474721/7/check/gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial/d8cdf1d contains 10k files, so rivalling tripleo jobs	19:07
*** AJaeger has quit IRC		19:07
*** lukebrowning has joined #openstack-infra		19:07
dmsimard	gate-openstack-ansible-openstack-ansible-ceph-ubuntu-xenial should be on about the same level	19:07
*** SumitNaiksatam has quit IRC		19:08
pabelanger	okay, ubuntu-trusty DIBs uploadin	19:08
fungi	and we're back under 10k free inodes. so we're really still not keeping pace with the rate at which new builds are uploading logs (or maybe only barely)	19:08
pabelanger	afk now	19:08
clarkb	maybe its worth a general email explaining that we shouldn't be copying all of /etc	19:09
clarkb	but ya ara is the bigger chunk of the pie for osa at least	19:09
*** AJaeger has joined #openstack-infra		19:09
dmsimard	I'll try and think of the plumbing involved for shifting from static reports to sqlite to central	19:09
*** camunoz has joined #openstack-infra		19:10
*** masber has quit IRC		19:10
fungi	how terrible would ara performance be if the report files were passed around as a bundle (tarball or something) and unpacked on the fly?	19:11
*** baoli has joined #openstack-infra		19:11
fungi	i guess you'd need some backend support to deal with that, or end up transferring all the data to the browser as a giant blob up-front	19:12
clarkb	odyssey4me: ^ fyi, any chance you can workto clean up the collection of /etc in your jobs?	19:12
fungi	3 inodes free :/	19:12
dmsimard	fungi: unpacked on the fly ? I've never done something like this before -- right now every file in gzipped individually and then there's the necessary mime types to make the webserver extract them on the fly	19:13
fungi	yeah, we're back to returning POST_FAILURE again	19:13
clarkb	odyssey4me: http://logs.openstack.org/44/479844/6/check/gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial/1541ded/logs/etc/ has 2111 inodes in use and a bunch of that is copying stuff that isn't really relevant to the jobs	19:13
dmsimard	fungi: We would need some sort of middleware ?	19:13
dmsimard	evrardjp, cloudnull ^ see clarkb's question	19:13
fungi	dmsimard: yeah, probably unless you serialized all the report data into a single file	19:13
fungi	which would likely be a big hit browser-side, i'm guessing	19:14
fungi	(hit to performance, not hit on the solid gold singles chart)	19:14
* cloudnull reading		19:15
*** gouthamr has quit IRC		19:15
dmsimard	fungi: It's not very realistic, no, there's too much data to display everything in one single file. I'll try and think of something relative to the sqlite database instead. The sqlite database is several order of magnitude smaller than even the gzipped static report, not to mention it's just one file.	19:16
clarkb	oh nice that sounds like win win win	19:16
clarkb	dmsimard: is that a useable feature today? we just have to turn it on?	19:16
cloudnull	^ I can go kick that out to the tests repo, if so	19:17
fungi	dmsimard: oh! neat, i didn't realize sqlite could do multiple tables in one file, but i'll admit i've done very little with it so far	19:17
cloudnull	clarkb: we have the log/config collection tasks within the tests role. are we needing to just prune that bacK?	19:18
dmsimard	clarkb: Well the sqlite database already exists, that's where the callback saves it's data and from where the web interface reads it. The static report generation is more or less a crawler that crawls all the pages of the interface and generates static files out of every page.	19:18
clarkb	cloudnull: ya if we could stop grabbing all of /etc and multiple copies of it that would be good.	19:19
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs https://review.openstack.org/511555	19:19
evrardjp	cloudnull: I am off for today, but if you're doing something to make the whole etc collection and archive that would be great	19:19
clarkb	cloudnull: its fine to copy things relavent to the job, openstack service logs and config or whatever	19:19
evrardjp	clarkb: it's multiple copies because it's multiple "hosts"	19:19
clarkb	cloudnull: but do you really need all of rc.* and fonts and logrotate and so on	19:19
dmsimard	fungi: BonnyCI ran with a mysql database of like 42 000 playbook runs :)	19:19
evrardjp	we generally need it, but it can definitely be a an archive	19:19
clarkb	evrardjp: right the problem is that stuff like ^ is all going to be identica and has no relevance to the job really	19:19
clarkb	evrardjp: its fine to copy the bits that are relevant to the job anddifferent like openstack service ofnfig	19:20
evrardjp	exactly	19:20
evrardjp	I agree	19:20
evrardjp	plus one or two locations, like apt sources	19:20
evrardjp	or yum repos	19:20
dmsimard	fungi: the challenge here is to go from a sqlite database saved on logs.o.o to an interface, somehow -- whether that's a centralized instance, or something generated on the fly from that database	19:20
evrardjp	the rest doesn't matter	19:20
evrardjp	and in all cases we can iterate later to add some small stuff	19:20
cloudnull	yea I think we did an /etc/.* just because it was easy, we could be a lot more tactical	19:21
evrardjp	but I think we should generally not ship those files directly	19:21
evrardjp	we should just archive those	19:21
dmsimard	fungi: ultimately, it's sqlalchemy with a sqlite connection string -- sqlite is usually on the filesystem. Maybe we can work out something that uses the sqlite database over http or something like that.	19:21
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs https://review.openstack.org/511555	19:21
openstackgerrit	Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove legacy-python{34,35} jobs https://review.openstack.org/511557	19:21
evrardjp	if we want the detail, we download and unarchive	19:21
clarkb	evrardjp: sound slike a plan then? prune and archive?	19:21
evrardjp	archive and don't even collect	19:21
evrardjp	so when the instance is destroyed, we don't care anymore	19:22
fungi	dmsimard: yeah, still needs some backend support, like a filter callout in apache maybe (that's how we do the fancy clickable log stuff with os_loganalyze)	19:22
evrardjp	I had a very long day, so I'd be happy if someone can take over this... cloudnull?	19:22
cloudnull	sure thing	19:23
cloudnull	looks like we just need to adjust https://github.com/openstack/openstack-ansible-tests/blob/master/test-log-collect.sh#L40-L62	19:23
cloudnull	which will take the pressure off from everyone of our role.s	19:23
cloudnull	for now we could comment all that out	19:23
cloudnull	and then work it back in	19:24
evrardjp	same for var log, we can only keep what's interesting for us	19:24
evrardjp	anything that can deblock others is good... But we still need logs at the end, because it reduces our ability to use gate results, and we don't want to spend cycles for nothing either :p	19:26
jeblair	fungi: back. looks like there's little progress on inodes?	19:26
clarkb	ya there is a balance to be reached	19:26
clarkb	with devstack-gate we try to add things when we notice we need them and be specific	19:26
clarkb	and we remove things as we notice they aren't useful too	19:27
clarkb	rather than juts wholesale copy (so I think pruning and archiving to single file is a big win there, thanks)	19:27
*** andreww has quit IRC		19:27
evrardjp	clarkb: yeah, I guess here we noticed "we need /etc/<something> "	19:27
evrardjp	and then yes we need /etc/<somethingelse>	19:27
fungi	jeblair: yes and no. i saw it fall all the way back to 0 but now we're nearing 300k free again	19:27
evrardjp	and then it finished to be a boatload of things	19:27
evrardjp	which is obviously wrong :p	19:27
jeblair	fungi: what's the next most dramatic step we can take?	19:28
*** xarses has joined #openstack-infra		19:28
fungi	jeblair: a few potential options: artificially constrain our nodepool quota, disable some of the top-offender jobs, or delete entire subtrees of the filesystem	19:28
jeblair	fungi: oh! what if i produced a list of v3 check jobs from zuul logs and we just rm-rfd those paths?	19:29
*** eharney has joined #openstack-infra		19:29
clarkb	jeblair: ++	19:29
*** baoli has quit IRC		19:29
fungi	jeblair: maybe... deleting jobs by name still seems to go pretty slowly mainly because there are a lot of wildcarded parent directories to get to the job names	19:29
cloudnull	evrardjp: https://review.openstack.org/511560	19:30
jeblair	fungi: i'm talking exact paths	19:30
jeblair	fungi: i should have said 'build' rather than 'job' :)	19:30
*** baoli has joined #openstack-infra		19:30
fungi	jeblair: oh, yeah i could probably loop over those pretty easily	19:30
jeblair	lemme see what i can produce	19:30
cloudnull	we can add it back once there's less pressure, and we have sometime to think about everything that we might really need.	19:30
*** camunoz has quit IRC		19:31
fungi	thanks jeblair!	19:32
evrardjp	cloudnull: ok. Alternative would be to tar them	19:32
evrardjp	let's already do htis	19:32
fungi	i still worry that tarring up files you don't know for sure you need just avoids doing the actual work of figuring out what information is actually useful	19:32
cloudnull	++	19:33
cloudnull	I think it'd be better to get a list together of what we really need	19:33
fungi	and makes it easier to never get around to working through that	19:33
evrardjp	fungi: oh yes, I mean tarring only what's useful	19:33
fungi	oh, got it. that would be cool as long as they're not useful to browse directly on a frequent basis	19:34
evrardjp	cloudnull: I updated the commit message with the reason and let's do it	19:34
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Image should be optional https://review.openstack.org/511299	19:34
cloudnull	we could do something like an include list from file and use that with our existing rsync commands so that its easy to add and remove as needed.	19:34
ianw	jeblair / pabelanger : looks like current status on the mirror is trying to rebuild the checksums.db, is that right?	19:34
evrardjp	cloudnull: yes, that's what I thought, basically saying WHAT we really want to collect.	19:34
cloudnull	but it'd be good to circulate that with the osa community so that we make sure we get everything useful for our folks	19:34
evrardjp	let me merge your patch quick then	19:34
jeblair	ianw: yes. i have added space to partition and added quota to volume, rebooted all servers involved, and am running the checksum rebuild with a db directory on local disk.	19:35
cloudnull	evrardjp: ok.	19:35
*** andreas_s has joined #openstack-infra		19:35
jeblair	ianw: (that way we avoid afs write errors on the db).	19:36
cloudnull	assuming jenkins doesn't kick us in the teeth that should be merged soon, which will have immediate impact on ALL of our role jobs.	19:36
clarkb	I'm semi manually going though and clearning out tmp/ansible from tripleo change logs	19:36
clarkb	cloudnull: thanks	19:36
clarkb	cloudnull: I think we are making progress on the inode front so optimistic it will get through	19:37
fungi	oh, wow, we're up over half a million inodes free now! i suspect we owe some of this to job volume falling now that zuul is no longer backlogged	19:37
jeblair	ianw: the immediate gnutls issue has been resolved by uploading new images built from our mirror.	19:37
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Add group parameter to create_server https://review.openstack.org/511305	19:37
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Image should be optional https://review.openstack.org/511299	19:37
clarkb	fungi: ya and I've cleared out about 100k so far	19:37
cloudnull	sorry for the issues fungi clarkb.	19:37
*** florianf has quit IRC		19:38
jeblair	clarkb, fungi: should we delete all v3 check and check-tripleo pipeline builds?	19:38
ianw	jeblair: excellent, thanks; i am glad that works. dib's "use this mirror during build" is maybe not as robust as i'd like	19:39
jeblair	ianw: you can see recent project-config changes merged to do that if you want to retro-review it	19:39
clarkb	jeblair: I think that would make a significant impact, I'd be in favor	19:39
mordred	jeblair: yah. I'm also in favor	19:40
fungi	jeblair: sure, i expect a majority of them to exhibit failures for issues we've since fixed	19:40
*** andreas_s has quit IRC		19:40
jeblair	i have a list of 132272 zuulv3 builds. 119842 of which are check*	19:40
fungi	so probably could stand fresh check results anyway	19:40
ianw	and i see system-config seems green, so that's good too	19:40
jeblair	fungi: static.openstack.org:~corvus/log-delete	19:42
fungi	we've just reached 0.1% free inodes now, so about 1/10th of what i'd like to see freed up before we #status ok	19:42
fungi	thanks jeblair! i'll start culling those	19:42
jeblair	fungi: cool, thanks	19:42
fungi	worth noting, my deletion of /srv/static/logs/??/??????///gate-tripleo-ci-*/logs/{und	19:42
fungi	ercloud/tmp/ansible,ara_oooq,undercloud/etc}	19:42
fungi	finally completed	19:42
ianw	cool, the only reason not to use DIB_DISTRIBUTION_MIRROR is that it leaves the mirror behind in the image. which doesn't matter in this case, but i didn't feel was suitable for the general case	19:42
mordred	fungi: \o/	19:42
fungi	yay stray newlines in my clipboard	19:42
dmsimard	clarkb, fungi: I was discussing with a colleague.. looking at https://github.com/openstack-infra/puppet-openstackci/blob/master/files/log_archive_maintenance.sh#L4-L10 would it make sense to do a .tar.gz archive of the whole job logs instead of just gzipping every file ?	19:43
clarkb	dmsimard: the reason to not do that is for browseability in your web browser	19:43
dmsimard	yeah, I get that	19:43
clarkb	if you tarball everything the nyou have to download and extract locally	19:44
dmsimard	but past a certain treshold, meh, I don't know	19:44
clarkb	fungi: oh I think ansible should have been ansible*	19:44
fungi	dmsimard: they cease to be browsable but i suppose we could consider doing that for logs over a week old or something	19:44
ianw	jeblair: was I right that zuulv2 didn't reload to see https://review.openstack.org/#/c/511360/?	19:44
dmsimard	otherwise, we could consider rotating logs off to another node (cold storage) or something	19:44
fungi	clarkb: thanks, i'll add that in a separate pass	19:44
dmsimard	just trying to think of different other things that could help	19:44
clarkb	fungi: actually you can just rm undercloud/tmp	19:45
clarkb	fungi: since the only content there is the ansible related stuff	19:45
dmsimard	fungi: maybe a treshold between 10 and 30 days, I don't know. Just saying the likelyhood of someone looking at logs >1 week gets increasingly smaller, and on that topic, it'd probably be interesting to look at apache logs to get some stats on what people are looking at.	19:45
fungi	clarkb: thanks, that'll help	19:45
ianw	dmsimard: heh, .tar.gz is my oldest "one day i'll fix this" change -> https://review.openstack.org/#/c/122615/ (not related to reducing inodes though)	19:46
fungi	dmsimard: we already were only keeping 30 days and that had us at 95% blocks used	19:46
ianw	"log packages are around 7MiB from my testing. This is big but not ridiculous." i think this is no longer true	19:46
fungi	so we needed to reduce retention for now anyway (and i've effectively dropped it to 14 days with the pass currently underway)	19:46
dmsimard	ianw: that review is very interesting, we were actually discussing something like that earlier fungi and I	19:47
dmsimard	ianw: the problem with ARA is that while it's not big, it's a lot of smaller files and we could perhaps make a .tar.gz and serve that instead	19:47
dmsimard	However I'm looking at another possibility right now, involving just having to store the sqlite database	19:48
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add a toggle to disable ARA static report generation https://review.openstack.org/511528	19:48
jeblair	ianw: re zuulv2 and pip8; i don't know. i forgot about that one.	19:49
fungi	jeblair: i'm using your list thusly: cd /srv/static/logs/ ; cat ~corvus/log-delete \| xargs rm -rf	19:49
jeblair	fungi: that sounds about right	19:49
fungi	actually, i think i'm going with my earlier plan for safety	19:50
clarkb	we just passed 1 million free	19:51
jeblair	fungi: for loop?	19:51
*** vhosakot has joined #openstack-infra		19:51
fungi	jeblair: sed s,^,/srv/static/logs/, ~corvus/log-delete \| xargs rm -rf	19:51
jeblair	fungi: heh, that one makes me nervous i'd sed that to a new file and just xargs from that file	19:52
fungi	jeblair: fair, i just re-audited the file to make sure it contains no absolute paths	19:52
fungi	and no ".."	19:53
fungi	deletion underway just catting the file and treating them as relative paths	19:53
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Add publish-deploy-guide job https://review.openstack.org/511563	19:54
fungi	we're finally well over a million inodes free	19:54
*** baoli has quit IRC		19:55
fungi	so i have hopes the current several patterns/lists under deletion will get us to our 1% free #status ok in relatively short order now	19:55
openstackgerrit	Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: convert deploy-guide to native zuul v3 https://review.openstack.org/511564	19:57
mordred	jeblair: good idea with the v3 job list!	19:57
*** harlowja has joined #openstack-infra		19:57
fungi	definitely. this should knock out a good chunk	19:57
fungi	along with the adjusted pattern for tripleo-ci ansible tempfiles and halving retention	19:58
*** pcaruana has quit IRC		19:58
mordred	++	19:58
*** baoli has joined #openstack-infra		19:59
fungi	and with a few teams making headway on reducing the number of files they're collecting, we should be in better shape in a couple weeks when we get back to a month of logs	19:59
*** baoli has quit IRC		19:59
SamYaple	just switch to btrfs with dynamic inodes. simple. there have never been issues at scale with btrfs.	20:00
mordred	SamYaple: yah - I can't see any potential issues with that at all	20:01
jeblair	2m inodes now, i clocked it at +125826 inodes/sec	20:01
fungi	SamYaple: reiserfs also didn't have an inode maximum	20:01
jeblair	er	20:01
jeblair	2m inodes now, i clocked it at +125826 inodes/min	20:01
jeblair	/sec would be truly impressive.	20:01
mordred	jeblair: that other rate would hve been amazing	20:01
jeblair	so maybe 45m to get to 7m free?	20:02
SamYaple	fungi: i dont want to... murder... performance though	20:02
fungi	my freighter can make the kessel run in 125826 parsecs	20:02
fungi	SamYaple: ooh, too soon	20:02
clarkb	I've really been enjoying zfs locally	20:02
clarkb	but scrubbing 12 TB of logs is probably very slow :/	20:03
SamYaple	yea zfs is da bomb for alot of things. but it has its weaknesses	20:03
dmsimard	clarkb, fungi: I was reading ianw's middleware patch for serving tgz's ( https://review.openstack.org/#/c/122615/ ) and it gave me an idea.. how about we always create the /ara/ log directory with the ara sqlite database in it, and then a middleware intercepts requests to that directory, if a static report is not generated, it generates it ? It doesn't sound overly complex to achieve and it would	20:03
dmsimard	make it so the ara reports would only be generated on demand and if required	20:03
fungi	clarkb: really, the slowness is the nearly a billion inodes	20:03
fungi	not so much the block size	20:03
clarkb	fungi: ya though it checksums all the data too iirc	20:04
clarkb	at a block level	20:04
openstackgerrit	Merged openstack-infra/shade master: Fix image task uploads https://review.openstack.org/511532	20:04
SamYaple	all data and metadata, yup	20:04
fungi	clarkb: oh, so bandwidth hit	20:04
clarkb	its so stupid simple to use	20:04
clarkb	really like the simplicity of it	20:04
SamYaple	and it can generate block devices	20:04
SamYaple	its really nice	20:05
SamYaple	thinly or think provisioned block devices at that	20:05
clarkb	would be two commands to have our current logs lvm set in place	20:05
fungi	clarkb: i'll reserve "simple to use" for something with mainline kernel support i can use for my boot/rootfs	20:06
SamYaple	fungi: 16.04 started including it	20:07
SamYaple	fungi: so you can totally do that by default	20:07
fungi	i suppose if i were to switch to freebsd it would be mainline	20:07
clarkb	I mean ext4 + lvm is also relatively simple, just more verbose	20:08
fungi	well, ubuntu is shipping out-of-tree kernel drivers for zfs, right? i thought the cddl was incompatible with the gplv2	20:08
clarkb	fungi: ya its a module	20:08
SamYaple	there was a big license uproar about it, but it landed as "meh"	20:08
SamYaple	the whole thing was "prove damages" and no one could	20:08
SamYaple	so it didnt really go anywhere	20:08
fungi	clarkb: yeah, all my personal systems boot from lvm2	20:08
clarkb	ya Fontana had a talk about it at seagl	20:09
fungi	grub has fine support for searching logical volumes these days	20:09
SamYaple	and zfs ;)	20:09
clarkb	tldr hard to show damages because source is provided on both sides and they aren't charging money from it thatyou'd be able to charge elsewhere	20:09
SamYaple	that was my takeaway too	20:09
clarkb	fungi: my zfs box is booting lvm + ext4 on a dedicated device which then mounts the zfs pool	20:09
fungi	heh	20:09
clarkb	approaching 2 million inodes	20:10
clarkb	also scrubs are auto niced for you	20:11
fungi	it's like we've finally reached warp factor 2	20:11
clarkb	so in theory they don't have major impact	20:11
fungi	and the reactor hasn't even shaken apart	20:12
SamYaple	warp 2 on which scale?	20:12
SamYaple	this is important	20:12
fungi	oh, cochrane scale, sorry	20:13
clarkb	jeblair: what was the magic sauce for handling all those emails back in the day and inode counts? I imagine that was a very high inode to disk ratio?	20:14
dmsimard	before we got sidetracked by filesystems discussion I was trying to brainstorm about solutions to help with the unfortunate contribution of ARA to the inode exhaustion :p Another low hanging fruit would be to consider generating an ara report only when there is a job failure	20:14
clarkb	dmsimard: ya that was the idea pabelanger had earlier, I like it because I really only look at ara when things hav ebroken	20:14
pabelanger	and back	20:14
pabelanger	catching up on backscroll	20:14
clarkb	that seems like a relatively easy intermediate fix	20:14
fungi	clarkb: "back in the day" your mailbox was one file which kept getting appended to	20:15
SamYaple	was this inode issue the underlying problem with the mirror?	20:15
dmsimard	clarkb, jeblair: would a post job know that the job is going to fail ?	20:15
dmsimard	I guess the executor knows, but it's probably not passed on as a piece of information to the post jobs	20:15
pabelanger	ianw: ya, we could configure-mirror roles should protect us with DIB_DISTRIBUTION_MIRROR, however we could also fix in finalize.d if we wanted	20:15
*** gouthamr has joined #openstack-infra		20:16
mordred	dmsimard: yah - I believe there is a status variable that the post job should know about	20:16
clarkb	SamYaple: no they were separate issues	20:16
mordred	dmsimard: it's called 'success'	20:16
clarkb	SamYaple: mirror is on afs, logs inode is a 12TB ext4 fs	20:16
SamYaple	got it. and also ouch	20:17
clarkb	ya when it rains it pours	20:17
SamYaple	all this unrelated to rolling out zuulv3 ya?	20:17
pabelanger	okay, ubuntu-trusty DIB uploading to rackspace and citycloud-kna1 still	20:17
dmsimard	mordred: zuul_success ?	20:18
dmsimard	mordred: like http://git.openstack.org/cgit/openstack-infra/project-config/tree/roles/submit-logstash-jobs/tasks/main.yaml#n6	20:18
clarkb	SamYaple: other than zuulv3 double running jobs potentiaily adding inodes to the logs fs correct	20:18
mordred	dmsimard: yes!	20:18
fungi	SamYaple: correct (discounting that we were adding a few additional build logs from running extra copies of a lot of jobs under v3 which likely didn't help matters)	20:18
SamYaple	man. craziness	20:18
dmsimard	mordred: ok, I'll send a patch.	20:18
*** hasharAway has quit IRC		20:18
mordred	dmsimard: cool - I think that'll buy us time to think about some of the other options	20:19
pabelanger	mordred: Shrews: do you think new shade will be today to address rackspace uploads? I'm asking, because we might have to remove ubuntu-trusty from rackspace, since it is broken	20:19
*** esberglu has quit IRC		20:19
fungi	pabelanger: honestly, we run so few jobs on trusty at this point that having it missing from a few regions for a while won't hurt us much	20:20
mordred	pabelanger: yes - it should be not too much longer	20:20
pabelanger	fungi: yah, that is true	20:20
fungi	i would just go ahead and delete it there regardless of the timeline for getting a replacement uploaded	20:20
pabelanger	fungi: I'll propose the patch	20:21
fungi	pabelanger: we can't just delete the trusty images there and wait for a corrected upload to eventually work?	20:21
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add a toggle to enable saving the ARA sqlite database https://review.openstack.org/511529	20:22
*** esberglu has joined #openstack-infra		20:22
pabelanger	fungi: Hmm, I think we could	20:22
pabelanger	that might be better	20:22
pabelanger	fungi: I'll start with rax-ord and see	20:23
*** ijw has quit IRC		20:25
*** ijw has joined #openstack-infra		20:25
mnaser	when all this zuulv3 stuff settle, i would like to work with some infra core to add some monitoring, these things are so much easier to solve when you know they're coming up beforehand :(	20:26
mnaser	i can share some of the stuff we do and the tooling of how to do it in a distributed way (mostly stateless sensu-server, servers define their own checks in sensu-client) .. but yeah, i think it'd make all of our lives easier to find out about issues in advance (hopefully)	20:27
clarkb	mnaser: at the ptg rough plan was tohave a spec detailing options available then going from there (just because there are so many tools and they have their own strenths and weaknesses)	20:27
clarkb	mnaser: I think we were initially wary of sensu due to its open core nature and the need to run a message bus for it	20:28
clarkb	(but it should be on the list of options probably)	20:28
mnaser	clarkb i have a very document going over many of the OSS monitoring tools and why we ended up at sensu so i'll find that and share it	20:28
*** AJaeger has quit IRC		20:29
clarkb	2.5million	20:29
*** kgiusti has left #openstack-infra		20:29
*** rbrndt has joined #openstack-infra		20:29
mnaser	yeah... but honestly, we haven't ran into any issues where we were like "dang, we'd want the enterprise for this one" .. the nice thing is that yuo dont have to maintain the checks in the server (unlike most other tools) but at the client, which makes it cleaner in writing puppet manifests and what not, but anyways, /me puts name down for that	20:29
*** rbrndt has quit IRC		20:29
dmsimard	mnaser: I'm accountable for drafting a spec to do proactive monitoring	20:30
dmsimard	mnaser: I signed up for that :)	20:30
mnaser	oh even better :>	20:30
*** jkilpatr_ has quit IRC		20:30
pabelanger	Shrews: when you have a moment, I'm not sure why image-delete is not working. I get back 'Image upload not found'	20:30
pabelanger	Shrews: sudo -H -u nodepool nodepool image-delete --provider rax-ord --image ubuntu-trusty --upload-id 0000002659 --build-id 0000000001	20:30
dmsimard	mnaser: I believe it was part of when we discussed https://etherpad.openstack.org/p/queens-infra-metric-collection	20:31
* fungi would prefer to see something based on an established standard, ideally snmp, but is willing to entertain other options		20:31
pabelanger	I don't mind nagios, after all these years	20:31
fungi	definitely not a fan of monitoring systems which need server-side agents speaking nonstandard protocols	20:31
fungi	i did deal with nrpe for many years on that front, but having proper snmp backends to check this is so much nicer	20:32
*** e0ne has joined #openstack-infra		20:32
fungi	and net-snmp is very extensible if you want to write your own extensions for custom mibs	20:32
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs https://review.openstack.org/511555	20:33
Shrews	pabelanger: hrm, not sure. may need to do some digging	20:33
clarkb	fun queston time. Assuming we've got inodes and ubuntu mirror stuff under control. Is the last outstanding item for v3 re rollout 511260 to fix cache useage?	20:33
fungi	and anyway, the two major issues we have would have been spotted by 1. trending inode usage for filesystems (there is a standard oid for that, easy enough to check over snmp) and 2. evaluating the last updated timestamps on our mirrors (these can be polled over http and analyzed quite trivially)	20:34
openstackgerrit	Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove legacy-.*python{34,35} jobs https://review.openstack.org/511557	20:34
fungi	clarkb: as far as i know, yes (well, and getting check/periodic pipelines added back i guess so people can dry-run their v3 jobs again)	20:35
jeblair	infra-root, dmsimard: when folks have a moment, i'd like to have some semi-structured conversation about 1) options for ara in v3 followed by 2) refreshing the rollout plan for v3	20:36
clarkb	I'm good now	20:36
jeblair	assuming we think that fires are out enough we can do that while we wait for bg tasks to complete	20:36
*** rbrndt has joined #openstack-infra		20:37
fungi	jeblair: yep, i think we're in a good place for that now. inode usage has been dropping steadily rather than increasing for a while	20:37
dmsimard	jeblair: I am working on a patch for emit-ara-html to be able to only generate a report on job failure, it's a low hanging fruit that we can put through fairly easily.	20:37
fungi	thanks dmsimard!	20:37
jeblair	let's use this etherpad: https://etherpad.openstack.org/p/hdYC2ZKfWd	20:37
dmsimard	jeblair: Beyond that, it requires a bit of thinking outside the box -- whether that's figuring out how to translate a sqlite database to an ara interface on the fly somehow, or use a centralized instance, etc.	20:37
jeblair	dmsimard: yeah -- let me articulate my current thinking:	20:38
jeblair	point 1: we think running ara on every v3 job is bad for inodes	20:38
jeblair	point 2: we want to roll out v3 soon	20:38
jeblair	point 3: we should come up with short-term solutions to give us breathing room to roll out v3	20:39
Shrews	pabelanger: oh, i think you have upload-id and build-id backwards	20:39
openstackgerrit	Paul Belanger proposed openstack-infra/project-config master: Remove ubuntu-trusty from rackspace https://review.openstack.org/511570	20:39
pabelanger	Shrews: oh, maybe	20:39
jeblair	point 4: there are long term changes that may make this better	20:39
pabelanger	Shrews: let me test	20:39
jeblair	so i'm thinking we mostly need to decide on a short term solution now to give us room to roll out v3 and implement long-term solutions	20:39
pabelanger	Shrews: Better! Thanks	20:40
jeblair	assuming we accept that point 1 is valid :)	20:40
mordred	I agree with those 4 points and the goal	20:40
* Shrews debugs pabelanger		20:40
*** hamzy has quit IRC		20:40
mordred	(like, I think that making it so that running ara on every change IS a thing that we want to do - but that is also likely to take slightly longer)	20:40
pabelanger	thanks! it is now deleting	20:40
*** Apoorva_ has joined #openstack-infra		20:41
dmsimard	I agree with that as well, however I'd appreciate highlighting that while ara contributes to the issue it is not the sole responsible :(	20:41
SamYaple	silly question, what is the ARA thing?	20:41
clarkb	SamYaple: ara has a floor of like 400 files per job and ceiling much higher depending on the job so it uses a lot more inodes than beofre when jobs might have had a couple files logged	20:41
dmsimard	SamYaple: this: http://logs.openstack.org/21/474721/7/check/gate-openstack-ansible-openstack-ansible-ceph-ubuntu-xenial/779e047/logs/ara/	20:41
*** e0ne has quit IRC		20:41
clarkb	In theory our jobs succeed more than they fail and ara is a useful debugging tool. I'd be inclined to start iwth ara only on failure and if that isn't enough then possibly just remove it by default?	20:42
SamYaple	oh i see	20:42
clarkb	would it be possible to have ara locally accept the json file and emit a report?	20:42
dmsimard	Going back to my previous statement, I'd like to make sure that we follow up with the other projects to make sure they are not needlessly logging things	20:42
*** eharney has quit IRC		20:42
clarkb	so that we can continue logging the json file and then only feed it to ara if you know you want it?	20:42
clarkb	dmsimard: yes we should continue pushing on that too	20:43
jeblair	yes, though we also have longer term plans to rework logging so we may not care as much.	20:43
dmsimard	clarkb: I am looking in doing a bit like what you are proposing but with the sqlite database instead. Running off of the JSON would require more work.	20:43
*** eharney has joined #openstack-infra		20:43
mordred	clarkb, dmsimard: I think that falls into the category of "medium to longer term we can make improvements to how we're using ARA or how ARA works or whatnot"	20:43
*** Apoorva has quit IRC		20:43
jeblair	for instance, following mordred's proposal to its conclusion means we get to the point where we say "every job gets 100MB. put whatever you want in there. it goes in swift. we don't care"	20:44
jeblair	so i think it's worthwhile to push back on some really large inode jobs, but i think that's not a long-term sustainable strategy.	20:44
* dirk has a few inodes to give away		20:44
* mordred takes dirk's inodes		20:44
clarkb	jeblair: swift too has inode like limits last I looked into it	20:45
mordred	jeblair, clarkb, dmsimard: so far I'm the biggest fan of dmsimard's change to run ara only on failure	20:45
clarkb	jeblair: basically there is a ceiling on reasonable number of objects within a container to maintain performance	20:45
pabelanger	I like #2 so far, but this isn't long term right?	20:45
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Add method to set bootable flag on volumes https://review.openstack.org/502479	20:45
mordred	clarkb: nod - but once we get to that point we'll have a good central place in-which to place limits	20:45
mordred	pabelanger: right - only short term	20:45
jeblair	what's the collecting the sqlite file option?	20:45
notmyname	clarkb: that "ceiling" is rather large, and it doesn't affect client performance	20:46
jeblair	i also favor ara-on-failure at the moment, but i do want to make sure we survey the options	20:46
mordred	jeblair: ++	20:46
clarkb	notmyname: I think its roughly in the range of our current inode limit though ~1billion	20:46
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Add group parameter to create_server https://review.openstack.org/511305	20:47
fungi	jeblair: doing something server-side with ara to render reports out of a database file in a similar way to how we use osla to render log files	20:47
mordred	jeblair: dmsimard was investigating something wsgi-like to do the report generation only on-demand	20:47
mordred	fungi: jinx	20:47
notmyname	clarkb: but it's a per-container thing. so if you're doing a new container per job, that's ok. or container per project may be better	20:47
jeblair	ah, interesting. i feel like that's probably a long-term thing. like, we should be weighing that against running a central server, (or static generation into swift)	20:48
fungi	notmyname: agreed, and we could be doing something similar with local filesystems, but what we're doing now is akin to dumping them all in one container	20:48
jeblair	dmsimard: is server-side sqlite generation a 2 day project or longer?	20:48
mordred	yah - I think that's not a 'by tomorrow' kind of option and might take a little longer for us to be comfortable with it- especially since the main value in the ara reports is helping to diagnose job issues	20:48
mordred	so if we need to make sure the on-the-fly report generation is solid ...	20:49
dmsimard	jeblair: I'm not sure if it's the best approach, it would mean logs.o.o would be using it's cpu for generating reports	20:49
notmyname	ack. the current feature/deep work (hoped for my early next year) will solve that once and for all (ie N billions of objects per container is no problem, only limited by your installed hardware capacity)	20:49
pabelanger	doesn't need to be logs.o.o, we could stand up another server	20:49
mordred	notmyname: cool	20:49
mordred	pabelanger: the files are sqlite	20:49
clarkb	oh that is good to know re swift	20:49
fungi	notmyname: neat!	20:49
jeblair	pabelanger: it'd have to get the sqlite file from logs.o.o, which is not scalable	20:49
dmsimard	jeblair: The direction I'm looking at, is more like... ara.openstack.org/?database=path/to/sqlite/in/logs.sqlite or something	20:50
mordred	I think it's likely better use of hacking resources to figure out a centralized ara than an on-demand ara	20:50
dmsimard	I don't know	20:50
mordred	I could be wrong about that - but it's an unknown enough I doubt it's the solution for this week	20:50
dmsimard	A centralized ara is not complicated, we just need to figure out how to feed the data back to the instance -- because we don't want to have the callback do a call to a remote mysql server on each task. Just the latency from farther nodepool regions would not be good.	20:51
jeblair	dmsimard: i put that as long term option #3	20:51
mordred	dmsimard: yah - well, there's also grouping issues we'd need to figure out too with centralized	20:51
dmsimard	An example would be to have a "post subunit gearman" thing and then have that import data back into a central instance	20:51
dmsimard	mordred: right, that too.	20:51
pabelanger	mqtt could be a long(er) term option too	20:52
mordred	dmsimard: having one ara with 10000 playbook runs called "run.yaml" ... :)	20:52
dmsimard	there's permalinks for playbooks but not group of playbooks	20:52
fungi	oh, i just realized this is specifically long-term options for ara, not more general options for increasing inode capacity/decreasing inode usage on the logs site	20:52
jeblair	pabelanger: how does mqtt help?	20:52
mordred	fungi: well - no, option 1 is not about ara	20:52
mordred	fungi: long term #1 is about offloading caring about inodes to swift - but has a few steps between us and it to be viable	20:52
dmsimard	I have to step away momentarily... not going to have time to finish the patch for only running on failure, if anyone can put that up I can review in ~1hr	20:53
dirk	Is moving stuff into a readonly squashfs mirror an option? That should save plenty of inodes	20:53
pabelanger	jeblair: in my brain, we publish to mqtt, instead of sqlite, then have a series of ara collectors, to generate static bits, then upload some place. However, would be a lot more moving parts	20:53
dmsimard	nevermind, looks like I got time	20:54
mordred	dmsimard: if you run out of time I can take over	20:54
jeblair	dirk: re-architecting log storage is a long-term option. that would be up there with "use ceph" and "use swift"	20:55
pabelanger	fungi: Shrews: I've deleted ubuntu-trusty images from rax-ord	20:55
mordred	dmsimard: or I can write it if you'd like to think about other things	20:55
jeblair	fungi: that docs-draft thing is worth keeping in mind as it's a medium-term mitigation.	20:55
dmsimard	mordred: nope, I'll have time to finish it.	20:56
dmsimard	mordred: also, it's something we have to be very cautious about.. remember it's in the base job and it's not something that is integration tested :(	20:57
*** caphrim007_ has quit IRC		20:57
mordred	dmsimard: ++	20:57
fungi	jeblair: agreed	20:57
clarkb	for 4 weeks of logs ~28760941 is the number of inodes we can use per day	20:57
*** caphrim007 has joined #openstack-infra		20:58
jeblair	okay, we only have two short term options: #2 run ara on success. #3 don't run ara at all. anything else we can do short-term?	20:58
clarkb	if we want to run 25k jobs per day that is about 1150 inodes per job average	20:58
clarkb	(25k per day is our rough peak from a year ago iirc)	20:58
mordred	jeblair: those are the only things I can think of right now for short term	20:58
*** dangers is now known as dangers_away		20:58
jeblair	clarkb: that makes me think reducing log retention should be in the short-term list	20:58
*** caphrim007_ has joined #openstack-infra		20:59
clarkb	I think 10-15k is likely a more reasonable current average jobs per day which will roughly double the inode count	20:59
*** caphrim007_ has quit IRC		20:59
jeblair	should we also consider reducing to 3 weeks retention as a short-term solution?	21:00
*** iyamahat has quit IRC		21:00
mordred	++	21:00
clarkb	devstack,grenade,tempest,tox related jobs all fit into that set of limitations based on my scanning. But osa, tripleo, and potentially others don't	21:00
clarkb	jeblair: ya I think so	21:00
fungi	ranking preferences, i would have to say i vote 2,3,4,1	21:00
pabelanger	I like #2 if we can swing it	21:00
pabelanger	but, understand if we have to do #3	21:00
mordred	well - also - the osa/tripleo inode counts are the same v2 and v3	21:00
dmsimard	mordred: what are the values for zuul_success ? It seems like it's either undefined or true	21:01
mordred	the main thing is the additional inodes from ara run by v3's ansible	21:01
clarkb	mordred: yes, roughly the same	21:01
mordred	dmsimard: yah - use the \| boolean filter	21:01
dmsimard	mordred: ok	21:01
mordred	clarkb: I _think_ a normal zuul-generated-ara report is around 500 inodes	21:01
fungi	we're up over 3m inodes free now, btw	21:01
*** iyamahat has joined #openstack-infra		21:01
*** caphrim007 has quit IRC		21:02
clarkb	mordred: ~400 seems to be the low end	21:02
clarkb	mordred: for eg pep8 jobs	21:02
mordred	(as opposed to the multi-10k reports from some of the larger jobs that are using ara in their job content	21:02
Shrews	wait, "run ara on success"? not failure? i must've missed something, cause failure is where ara is most helpful, yeah?	21:02
jeblair	i was leaning toward 2+4 togther then fallback to 3.	21:02
mordred	Shrews: on failure	21:02
*** trown is now known as trown\|outtypewww		21:02
clarkb	jeblair: ya I think that is my preference too	21:02
fungi	Shrews: run on failure, using the zuul_success variable to determine whether there was a failure	21:02
clarkb	3 is fallback from 2	21:02
mordred	my prefernce is also 2+4 and fallback to 3	21:03
jeblair	fungi: if i interpret your earlier statement, you'd prefer to keep retention at 4 weeks even if it means not running ara at all?	21:03
fungi	i'm uncertain option 4 is strictly necessary (aside from the one-time expiration i'm doing right now to deal with the current crisis)	21:04
dmsimard	What we need to keep in mind is that we'll need to retrofit the 'generate ara only on failure' to openstack-ansible, tripleo and kolla-ansible as well, they are using it outside of zuul v3	21:04
fungi	but yeah, i see options 3 and 4 as roughly equal preference	21:04
*** edmondsw has quit IRC		21:05
clarkb	dmsimard: I think thats a separate concern of continuing to work with various projects to prune and curate the logs they collect	21:05
fungi	so maybe my preference is 2,3\|4,3&4,1	21:05
clarkb	dmsimard: that might involve only running ara on failure along with cleaning up etc/ and so on	21:05
jeblair	fungi, clarkb, mordred: i think the compromise position then is 2, then 4, then 3. how's that sound?	21:06
fungi	wfm	21:06
dmsimard	I'll have a patch up for #2 soon.. just being extra careful about it and testing every bit of it	21:06
mordred	++	21:06
fungi	dmsimard: appreciated!	21:06
clarkb	jeblair: sounds like a plan	21:06
jeblair	okay, i put that in the etherpad	21:07
jeblair	the next thing, while we're all here, is how we should proceed with v3 rollout	21:07
dmsimard	mordred: zuul_success is undefined on failure, right ? (double making sure)	21:07
jeblair	i'm inclined to say that we should allocate tomorrow as a day for continued stabilization	21:07
*** jkilpatr_ has joined #openstack-infra		21:08
jeblair	the mirror issue may not be resolved by tonight, or even by tomorrow	21:08
SamYaple	jeblair: but i want my zuulv3 for the weekend :(	21:08
clarkb	511260 finally appears close to merging	21:08
dmsimard	+1, rolling out on a friday is not a good idea	21:08
*** thorst has quit IRC		21:08
clarkb	jeblair: do we want to maybe turn check et al back on in v3 and watch it?	21:08
fungi	jeblair: yes, at this point i'd be concerned about rolling back onto v3 on a friday	21:08
clarkb	maybe after the ara on failure thing is in place	21:09
mordred	dmsimard, jeblair: two things- a) it's always defined in post playbooks b) it's not set if a pre-playbook or a previous post playbook failed	21:09
dmsimard	mordred: ok.	21:09
mordred	so I think we should also make a patch to zuul to make sure we set either zuul_success or a new variable if ANY of the playbooks fail	21:09
jeblair	mordred: maybe we need a new var?	21:09
mordred	jeblair: yah. let's do that ... I can make that patch	21:09
jeblair	yeah, one of those things. i'm not sure which yet. :)	21:10
fungi	clarkb: i could see turning check pipelines back on in v3 sometime tomorrow, as overal ci volume tends to trail off around 16:00z or so	21:10
*** kjackal_ has quit IRC		21:10
fungi	on fridays	21:10
jeblair	even with ara-on-failure in place, are we worried about general additional volume?	21:10
pabelanger	jeblair: fungi: if we did rollout on friday, it would possible low jobs over weekend for fixes eager people would want to make	21:10
pabelanger	but, agree we should stabilize first	21:10
jeblair	it's not a lot, though it will run continuously over the weekend and still emit more than normal log volume	21:10
fungi	pabelanger: but fewer of _us_ around to deal with lurking bugs in zuul v3 we have yet to uncover	21:10
jeblair	also, we've had >24h partial outage; i kinda don't want to push it.	21:11
jeblair	it might be nice for folks to be able to land changes for a few minutes. :)	21:11
fungi	yup	21:11
*** eharney has quit IRC		21:12
clarkb	ya I also don't like feeling compeleed to firefight over the weekend :)	21:12
jeblair	i'm going to start a new section on that etherpad at the bottom	21:12
*** gouthamr has quit IRC		21:13
fungi	i so hope it's a section listing new drinking games i can try over the weekend	21:14
jeblair	i don't normally like to say "can i please work on the weekend?" but at this point would be willing to help flip the switch sunday evening.	21:14
*** eharney has joined #openstack-infra		21:14
mordred	yah. I think a sunday rollout is not a terrible idea	21:14
jeblair	and of course, our 1100utc plan for monday or tuesday would be fine too.	21:14
fungi	i could see doing it late sunday _if_ ianw and yolanda are handy to keep an eye on things	21:15
mordred	yes - all three work for me- I'll be flying starting afternoon local time on monday ...	21:15
mordred	so if we do monday morning I may be less available to help than other times	21:15
clarkb	ya I don't mind a later sunday	21:15
fungi	otherwise it'll be one of those where i wake up and never get a chance to pour a cup of coffee	21:15
mordred	I think my preference is sunday, tuesday, monday	21:15
mordred	(or, rather, those are ordered by amount of time/effort I'll be able to directly contribute)	21:16
clarkb	I like sunday because we'll be able to hopefully sort out any issues without the full load of the system on it	21:17
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Switch success to false if a post playbook fails https://review.openstack.org/511619	21:17
clarkb	it was definitely easier to fix problems over the weekend after the last rollout	21:17
dmsimard	what's the rush?	21:17
dmsimard	why sunday ?	21:17
fungi	dmsimard: the longer we wait, the later into the release cycle we creep with these disruptions	21:18
mordred	dmsimard: job config is effectively frozen for people	21:18
mordred	and yah - what fungi said	21:18
dmsimard	okay, that's fair	21:18
mordred	dmsimard, jeblair, clarkb: https://review.openstack.org/511619 is the zuul_success patch	21:19
clarkb	I've also got to start prepping for event things (summit mostly) so earlier the better for me	21:19
jeblair	okay, to my secret disappointment, no one has vetoed sunday :)	21:19
mordred	jeblair: :)	21:19
jeblair	what time sunday works for east-coasters?	21:19
dmsimard	mordred: I'm covering for the edge case where it might not be defined	21:19
jeblair	and, erm, central coasters?	21:19
mordred	dmsimard: that's great - that patch is just making sure that if a post playbook fails that subsequent post playbooks get success==false	21:20
dmsimard	My mother is coming to visit this weekend, I'm east coaster and can respond to pings but might not be available for longer periods of sustained work	21:20
fungi	i have no hard scheduled obligations next week, and am happy to support whatever/whenever people want to do the next v3 rollout attempt	21:20
fungi	i can be around as late as, say, 04:00z	21:21
jeblair	dmsimard: well, i'm not expecting us to do sustained work on sunday, more like perform the transition so that people start the day on nv3	21:21
fungi	(which is technically utc monday, not sunday, but whatevs)	21:21
jeblair	start monday on v3 that is	21:21
mordred	jeblair: agree	21:21
pabelanger	I'm traveling on Wed, Thur, Friday next week too	21:21
mordred	turns out flipping the switch itself is actually not too hard	21:21
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure https://review.openstack.org/511622	21:21
dmsimard	mordred, jeblair, fungi, clarkb, pabelanger ^	21:22
dmsimard	I think that's okay, but please do review it carefully, no integration tests and all	21:22
Shrews	i can be around late sunday, but am having my tooth cleaned monday morning so less around then	21:22
*** aviau has quit IRC		21:22
fungi	Shrews: just the one tooth, eh?	21:22
dmsimard	lol	21:22
*** aviau has joined #openstack-infra		21:22
*** jascott1 has quit IRC		21:22
* clarkb somewhat arbitrarily throws out 2200UTC		21:23
Shrews	fungi: of course. all i need	21:23
clarkb	that is mid afternoon for pacific coasters and evening for easter/central coasters	21:23
fungi	Shrews: as long as you can open a beer with it, you're all set i suppose	21:23
jeblair	clarkb: that wfm	21:23
mordred	clarkb: 2200 wfm	21:23
fungi	i'm cool with a 22-23:00z window	21:24
dmsimard	jeblair: vars across jobs are merged or replaced ?	21:24
fungi	that's like 6-7pm local here so early by my standards	21:24
dmsimard	jeblair: I mean, if we put 'ara_generate_report: failure' as a var in the base job(s), it's going to stick around, right ?	21:24
jeblair	dmsimard: merged	21:24
dmsimard	jeblair: ok, nice.	21:24
*** mat128 has quit IRC		21:25
jeblair	dmsimard: yes. it will be overridable by children, but i'm not worried about folks overriding that. for now.	21:25
mordred	++	21:25
clarkb	ok should we call it 2200UTC sunday then?	21:25
mordred	++	21:25
clarkb	use the rest of today and tomorrow to stabilize	21:25
jeblair	dmsimard: let's just stick in some documentation asking folks to please not override it. :)	21:25
jeblair	clarkb: ++	21:25
fungi	clarkb: sgtm	21:25
dmsimard	jeblair: I was more worried about someone declaring just 'vars' and removing it more than someone overriding the default value	21:26
clarkb	mordred: do you want to follow up to your thread about the mirror and devstack-gate with a zomg inodes but now we are looking to be in a better place and aiming for sunday rollout?	21:26
*** aeng has joined #openstack-infra		21:26
mordred	sure!	21:26
openstackgerrit	David Moreau Simard proposed openstack-infra/project-config master: Test ARA report generation only on failure in base-test https://review.openstack.org/511624	21:26
dmsimard	^ testing the toggle in base-test	21:27
fungi	we're just over 4m inodes free now, so about halfway to where we want to be for a 1% free cushion	21:27
clarkb	oh maybe wait for the cushion before emailing	21:27
clarkb	and possibly sneak in a "please review your logs and remove unnecessary things like logging all of etc or selinux or systemd units	21:27
fungi	yes, we can gloat about what a great position we're in with only 99% inode utilization on that volume ;)	21:28
clarkb	I could also delete the logs for this one change and free up 2.8 million inodes	21:28
clarkb	to get us there quicker >_>	21:28
fungi	deletion of /srv/static/logs/??/??????///gate-tripleo-ci-*/logs/ercloud/tmp completed a little while ago, looks like	21:29
clarkb	fungi: hrm I was still seeing them	21:29
clarkb	was it undercloud and no ercloud?	21:30
fungi	er, that's a terrible pattern	21:30
fungi	i must have missed an un when i edited that line. restarting :/	21:30
fungi	"und" added and rerunning. /srv/static/logs/??/??????///gate-tripleo-ci-*/logs/undercloud/tmp this time	21:31
mordred	fungi: that seems gooder	21:31
pabelanger	fungi: clarkb: jeblair: gnutls issue on ubuntu-trusty fixed now too	21:31
clarkb	fungi: oh I think there is a bug in that too	21:31
fungi	pabelanger: excellent!	21:31
pabelanger	and bad images from rackspace deleted	21:31
clarkb	fungi: needs to be gate-tripleo-ci-//logs/undercloud/tmp	21:31
pabelanger	should be re-uploaded once shade has been released	21:32
clarkb	fungi: to get the build uuid	21:32
fungi	clarkb: yep, you're right. fixing	21:32
pabelanger	fungi: clarkb: which means we can then land https://review.openstack.org/511543/ to fix system-config	21:32
*** thorst has joined #openstack-infra		21:33
clarkb	pabelanger: I've approved it so should be fine if the recheck comes around good	21:33
mnaser	yay, things getting fixed	21:34
pabelanger	distributed sysops	21:34
mordred	dmsimard: one comment	21:34
*** eharney has quit IRC		21:34
mordred	dmsimard: otherwise looks good to me	21:34
mordred	clarkb: I would not oppose you deleting all the logs for that one job :)	21:35
pabelanger	clarkb: I've also just removed nb04.o.o from emergency file	21:35
*** thorst has quit IRC		21:36
*** lifeless has quit IRC		21:37
dmsimard	mordred: the reason it took longer to get the patch up is that I was testing exactly your comment	21:39
dmsimard	mordred: the problem is that false and failure have different behaviors	21:39
dmsimard	mordred: false == never generate, failure == only generate on failure	21:39
mordred	dmsimard: yes - they do - but the first condition is checking for true	21:40
dmsimard	true would be == always generate	21:40
mordred	dmsimard: so == true and \| bool should both have the same effect	21:40
mordred	you're using \| bool on the false branch already	21:40
dmsimard	oh	21:40
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Image should be optional https://review.openstack.org/511299	21:40
dmsimard	let me test	21:40
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Add method to set bootable flag on volumes https://review.openstack.org/502479	21:40
mordred	dmsimard: it's possible this is dumb - my python brain is rejecting == true -but maybe in jinja == true is ok?	21:40
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Allow domain_id for roles https://review.openstack.org/496992	21:41
clarkb	511260 should enter the gate shortly, last job is running against it now	21:41
dmsimard	mordred: yeah, == true is okay and works	21:41
clarkb	did we decide on whether or not we should reenable v3 pipelines?	21:41
dmsimard	mordred: but \| bool also works	21:41
clarkb	jeblair: ^	21:41
dmsimard	mordred: I was afraid of using \| bool for what might up being a string (that would evaluate to true)	21:42
dmsimard	mordred: in python, a non-empty string is true	21:42
mordred	dmsimard: no - \| bool on a string is false (I just checked that in ansible)	21:42
*** srobert_ has joined #openstack-infra		21:42
dmsimard	mordred: but in the jinja bool filter, it looks like a non-empty string (that is not 'false') is false	21:42
mordred	dmsimard: BUT ... if == true works in jinja, let's do it	21:42
openstackgerrit	Merged openstack-infra/puppet-openstack_infra_spec_helper master: Cap signet < 0.8.0 https://review.openstack.org/511543	21:42
dmsimard	mordred: \| bool is fine, I tested it and it works.. it's just my brain confusing python and ansible/jinja boolean :/	21:43
mordred	dmsimard: or, rather, I'm fine either way now that we've verified that \| bool and == true both ahve the same impact	21:43
clarkb	pabelanger: ^ there we go	21:43
openstackgerrit	David Shrewsbury proposed openstack-infra/shade master: Move role normalization to normalize.py https://review.openstack.org/500170	21:43
jeblair	clarkb: drat. we did not.	21:43
mordred	maybe re-enable tomorrow once the system has stablized more?	21:43
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure https://review.openstack.org/511622	21:43
dmsimard	mordred: ^ now with \| bool	21:43
jeblair	mordred: yeah, that sounds like a plan	21:44
clarkb	wfm	21:44
pabelanger	clarkb: yah, rechecking system-config now	21:44
clarkb	over 5 million now	21:44
dmsimard	mordred: also added a comment for posterity	21:44
dmsimard	clarkb: wow that's over 9000	21:44
fungi	mordred: jeblair: clarkb: i suggested tomorrow as well, mainly because around 16:00z on a friday utilization will start to trail off heading into the weekend so we can continue to make progress on inode cleanup in the background	21:44
openstackgerrit	Mohammed Naser proposed openstack-infra/openstack-zuul-jobs master: Drop tox_constraints_file from include_role for release notes https://review.openstack.org/511627	21:45
openstackgerrit	Mohammed Naser proposed openstack-infra/openstack-zuul-jobs master: Move tox_envlist into job variables for releasenote jobs https://review.openstack.org/511628	21:45
*** srobert has quit IRC		21:45
mnaser	i identified two issues with releasenote jobs, very quick review mordred ^	21:45
fungi	somewhere around 16-19:00z at any rate	21:45
mnaser	you can also see the failure happening here - http://logs.openstack.org/54/511054/1/check/build-openstack-releasenotes/6aeff53/job-output.txt.gz	21:45
fungi	but for now, i need to disappear for a while. back later	21:46
mnaser	(sorry just stomping here mid-discussion)	21:46
*** jcoufal has quit IRC		21:46
*** srobert_ has quit IRC		21:47
mordred	mnaser: yes. great patches	21:47
*** esberglu has quit IRC		21:47
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Switch success to false if a post playbook fails https://review.openstack.org/511619	21:48
*** esberglu has joined #openstack-infra		21:48
mnaser	should i add those in any etherpad that's being used so they get eyes or maybe someone here around can give them a quick review (i haven't been following much today)	21:48
*** iyamahat_ has joined #openstack-infra		21:48
*** boden has quit IRC		21:49
mordred	clarkb, pabelanger, fungi: mnaser's patches above lgtm - could use an extra set of eyeballs	21:49
*** iyamahat has quit IRC		21:49
*** slaweq_ has quit IRC		21:50
pabelanger	+3	21:50
jeblair	mordred: i know we all have brainhurt, but do you to chat about periodic jobs and branches now?	21:50
mordred	jeblair: yes - although I just had an idea I want to float first ...	21:51
jeblair	https://review.openstack.org/511533 was the change which brought it up	21:51
mordred	jeblair: does abandoning a change emit an event zuul can act on?	21:51
jeblair	mordred: yes	21:51
jeblair	(whether it correctly does act on it in either master or v3 atm, i could not say for certain)	21:52
pabelanger	clarkb: zomg, centos-7 failed again with ssh key	21:52
mordred	jeblair: what if we made a 'cleanup' pipeline that ran when a change is abandoned, and had something that would delete the logs for an abandoned change ...	21:52
pabelanger	clarkb: looks like I'll be debugging that next	21:52
mordred	(came to mind as I just abandoned a bunch of DNM test patches that each had a ton of tests associated with them)	21:52
*** esberglu has quit IRC		21:52
jeblair	mordred: the v3 logs have the abandoned tests enabled, so it should be functioning.	21:53
pabelanger	for now, I have to run.	21:53
jeblair	mordred: interesting. i think that would take code changes.	21:53
jeblair	mordred: i believe we have zuul hard-coded to remove abandoned changes from pipelines	21:53
jeblair	mordred: so we'd have to drop that and rely on the 'status:open' pipeline requirement	21:53
jeblair	mordred: i think it's feasible; couple hours of work.	21:54
*** threestrands has joined #openstack-infra		21:54
jeblair	(tbh, i think that's the better long-term structure for the code anyway)	21:55
*** Apoorva_ has quit IRC		21:55
jeblair	(i think the hard-coding predates pipeline requirements)	21:55
mordred	jeblair: maybe let's put that on the backburner for next time we feel like hacking in such areas ... if we did that, I think perhaps we rename the merge-check template to "system-required" and put the 'delete logs' job into that - so that the openstack story is "you always have to have the system-required template"	21:55
*** gouthamr has joined #openstack-infra		21:55
jeblair	++	21:56
mordred	maybe we should do that second part anyway	21:56
jeblair	++	21:56
mordred	jeblair: ok - so - branches	21:57
jeblair	the idea in v3 is that periodic jobs are just like regular jobs. so instead of putting "periodic-foo-master" and "periodic-foo-pike" on a project, you just put "foo"	21:57
mordred	jeblair: yes! this I agree with wholeheartedly	21:58
jeblair	zuul emits trigger events for every project-branch combination	21:58
mordred	jeblair: and now I believe I understand what you were saying	21:58
*** wolverineav has quit IRC		21:58
jeblair	so if you add a periodic job to a project, it'll run on all that projects branches	21:58
jeblair	so if you only want it to run on a subset of branches, you just use branch matchers in the project-pipeline in the regular way	21:58
mordred	jeblair: so instead of putting branch-override ... yah. that	21:58
openstackgerrit	Merged openstack-infra/openstack-zuul-jobs master: Drop tox_constraints_file from include_role for release notes https://review.openstack.org/511627	21:58
mordred	jeblair: cool. that all makes sense to me- and yes, I think that's definitely the way to go for new jobs	21:59
jeblair	(or, you can put the branch matcher on the job definition itself, if that's something you can say globally)	21:59
jeblair	mordred: yep	21:59
*** yamamoto has joined #openstack-infra		21:59
*** lifeless has joined #openstack-infra		22:00
mordred	jeblair: for legacy jobs... I think ajaeger's patch - except s/override-branch/branches/ is the right thing	22:00
mordred	jeblair: becuase those generated jobs are all expecting to only be triggered for the branch in question	22:00
*** wolverineav has joined #openstack-infra		22:00
mordred	jeblair: and we should definitely replace them all with new v3 jobs that are done correctly - but I don't think we should try to correct them semantically in place	22:00
jeblair	okay, that works for me.	22:01
mordred	(I worry that if we tried to collapse at that scale we'd get something weirdly wrong)	22:01
jeblair	good point	22:01
mordred	branches can take a scalar right?	22:01
jeblair	mordred: a scalar or a list	22:01
mordred	so "branches: master" works? awesome	22:01
mordred	I'll modify that patch real quick	22:01
openstackgerrit	OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645	22:02
*** esberglu has joined #openstack-infra		22:02
jeblair	yep. as does "branches: [stable/ocata, stable/pike]". this is, i think, going to be a big improvement. :)	22:02
jeblair	the checksums process is 93% complete; i'm afk for 20m.	22:03
*** iyamahat_ has quit IRC		22:03
*** iyamahat__ has joined #openstack-infra		22:03
mordred	jeblair: while I'm doing that - you feel like restarting v3 scheduler to pick up the zuul_success fix so we can test dmsimard's patch?	22:03
mordred	or - afk - that's better	22:03
clarkb	pabelanger: I have a change up to help debug that by dumping data from config drive and /home/root/.ssh/authorized_keys, not sure if it mergee	22:04
dmsimard	I have to relocate, I'll be back in >1hr	22:06
*** esberglu has quit IRC		22:06
openstackgerrit	Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add branches to all periodic jobs https://review.openstack.org/511533	22:07
*** rbrndt has quit IRC		22:08
openstackgerrit	Merged openstack-infra/system-config master: Add documentation on force-merging a change https://review.openstack.org/511248	22:08
openstackgerrit	Merged openstack-infra/shade master: Add group parameter to create_server https://review.openstack.org/511305	22:10
*** rbrndt has joined #openstack-infra		22:10
*** jascott1 has joined #openstack-infra		22:12
*** bobh has quit IRC		22:12
*** mriedem1 has joined #openstack-infra		22:14
*** mriedem has quit IRC		22:15
*** jascott1 has quit IRC		22:16
*** jascott1 has joined #openstack-infra		22:16
*** Keitaro has quit IRC		22:20
*** jascott1 has quit IRC		22:21
*** gildub has joined #openstack-infra		22:22
clarkb	pabelanger: https://review.openstack.org/#/c/501887/	22:25
*** jascott1 has joined #openstack-infra		22:25
clarkb	I've rechecked it, I was sort of hoping we'd catch a failure premerge	22:26
clarkb	we are at 99% used now	22:26
clarkb	fungi: ^ fyi	22:27
openstackgerrit	Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use native propose-translation jobs https://review.openstack.org/511435	22:28
openstackgerrit	Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add branches to all periodic jobs https://review.openstack.org/511533	22:28
clarkb	I've approved 511260 now	22:28
*** threestrands has quit IRC		22:28
mordred	jeblair, clarkb: ^^ updated both patches - they should be correct for v3 now	22:28
clarkb	I'm reviewing the ara on failure related changes now	22:30
mordred	cool	22:30
jeblair	mordred: back. i think that's an executor fix. we can leave the scheduler and restart the execs	22:30
mordred	jeblair: oh! good point	22:30
jeblair	mordred: i will do that	22:30
mordred	jeblair: thanks	22:30
*** gouthamr has quit IRC		22:30
jeblair	okay, i ran "service zuul-executor stop" on all ze machines, and they all stopped cleanly	22:32
jeblair	mordred: there's something i think i need your help with though	22:33
jeblair	mordred: every time zuul is installed, it seems to ignore the "-e git+https" requirement and re-installs the new version of gitpython	22:33
jeblair	mordred: i think clarkb said pbr may somehow be involved	22:33
fungi	i just knew if i stopped to eat something, i'd miss the great unveiling of the 99%	22:34
clarkb	jeblair: mordred yes it is, basically setuptools doesn't understand those requirements so when pbr reads those reqs into setuptools it strips all the git stuff out and uses the egg as is	22:34
clarkb	jeblair: mordred you have ot intsall requirements with pip directly to get it to do what you expect	22:34
*** Keitaro has joined #openstack-infra		22:34
clarkb	mordred: dmsimard re https://review.openstack.org/#/c/511624/1 what tests use base-test? I'd like to see them not include ara reports on success	22:35
jeblair	clarkb: yeah, but we re-install zuul on every commit. so that means we're now uninstalling our gitpython fork on every commit.	22:35
clarkb	jeblair: probably the thing to do is make zuul install a pip install -U /path/to/zuul && pip install -U /path/to/zuul/requirements.txt ?	22:35
ianw	jeblair: i see the checksums has now exceeded the size of the original. that's good, i guess?	22:36
jeblair	clarkb: the procedure is to land a change to base-test, then create a DNM change which reparents any job (say, unittests) to base-test, and examine the results. if that works, then we copy the change from base-test to base.	22:36
*** lin_yang has joined #openstack-infra		22:36
jeblair	clarkb: nothing normally uses base-test. so it's safe to land changes to it as long as they look reasonable.	22:37
clarkb	jeblair: gotcha, so we have to merge the things first	22:37
jeblair	yep. we should put this in some documentation around there :)	22:37
jeblair	ianw: neat!	22:37
*** threestrands has joined #openstack-infra		22:37
clarkb	I've approved the parent of https://review.openstack.org/#/c/511624/1 if we can get a second review on that change it would be great to get this tested soon	22:38
jeblair	done	22:39
clarkb	tyty	22:39
mordred	jeblair: I agree with the thing clarkb said - we could also do pip install -U /path/to/zuul/requirements.txt && pip install --no-deps -U /path/to/zuul	22:40
jeblair	clarkb: yeah, i think your install procedure will work	22:40
jeblair	i just tested that manually...	22:40
jeblair	mordred: should i do clarkb's thing or try yours?	22:40
clarkb	mordreds is likely a bit quicker	22:41
mordred	jeblair: try mine- it's less churn - clarkb's will owrk - but will result in gitpython temporarily being changed	22:41
fungi	and does also cause some dependencies to be installed and then reinstalled i guess	22:41
clarkb	seems like inode cleanup is really flying now. I wonder if that is the addition of the undercloud/tmp cleanup?	22:42
fungi	(in the case of the ones which have to be installed from git urls)	22:42
fungi	clarkb: i have a feeling it's ci load dropping off for the evening	22:42
fungi	we tend to make more progress on bulk cleanup tasks like this on the logs site off-hours and on weekends	22:43
*** mriedem1 has quit IRC		22:43
clarkb	ah	22:43
openstackgerrit	Monty Taylor proposed openstack-infra/puppet-zuul master: Split zuul and requirements install https://review.openstack.org/511637	22:43
fungi	whereas during peak load its lucky to be marching in place	22:43
mordred	jeblair, clarkb: ^^ like that	22:43
jeblair	oh that's easier than what i was about to do. :)	22:43
jeblair	mordred: needs a "-r" though, right?	22:44
mordred	jeblair: were you going to make a puppet resource dependency graph?	22:44
jeblair	mordred: yes	22:44
mordred	jeblair: YES	22:44
openstackgerrit	Monty Taylor proposed openstack-infra/puppet-zuul master: Split zuul and requirements install https://review.openstack.org/511637	22:44
jeblair	lgtm	22:45
clarkb	approved	22:45
mordred	jeblair: does https://review.openstack.org/#/c/511533 and its parent look better to you now?	22:45
jeblair	i re-installed deps manually and restarted all ze machines	22:46
jeblair	mordred: yep	22:47
clarkb	because I was curious I checked my local 4TB zpool's inode count and it has more than 7 times the number of inodes of our 12TB fs	22:47
clarkb	(and that was just with default fs creation commands)	22:47
clarkb	if we ever get around to moving this filesystem maybe we should multiply the inode count by some big number	22:48
jeblair	ya	22:48
mordred	who would ever need more than 640k of inodes	22:48
mordred	s/0//	22:48
mordred	bleh	22:48
*** rbrndt has quit IRC		22:52
jeblair	fungi: want to send the status ok?	22:54
*** iyamahat__ has quit IRC		22:54
jeblair	ianw: however, being at 105% of completion has thrown off my time estimates. :\|	22:54
ianw	i got all day :)	22:55
ianw	i just hope it fixes it	22:55
clarkb	jeblair: that is a neat trick	22:55
jeblair	the file it currently has open is in the "/u/" directory	22:56
jeblair	i have no idea how close to alphabetical it is though.	22:56
fungi	jeblair: you bet	22:56
jeblair	next time (please no) -- find > file, then reprepro < file.	22:57
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure https://review.openstack.org/511622	22:57
*** iyamahat has joined #openstack-infra		22:58
fungi	status ok Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now	22:59
fungi	that look okay?	22:59
jeblair	++	23:00
clarkb	yes	23:00
ianw	jeblair: combined with "pv" in the middle, it might even come up with an accurate %	23:00
fungi	#status ok Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now	23:00
openstackstatus	fungi: sending ok	23:00
*** aeng has quit IRC		23:01
*** ChanServ changes topic to "Discussion of OpenStack Developer and Community Infrastructure \| docs http://docs.openstack.org/infra/ \| bugs https://storyboard.openstack.org/ \| source https://git.openstack.org/cgit/openstack-infra/ \| channel logs http://eavesdrop.openstack.org/irclogs/%23openstack-infra/"		23:03
-openstackstatus- NOTICE: Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now		23:03
jeblair	ianw: oh it finished!	23:03
jeblair	13484 files were added but not used.	23:03
jeblair	The next deleteunreferenced call will delete them.	23:03
ianw	:/ that seems like ... a lot	23:03
*** xarses has quit IRC		23:04
jeblair	ianw: i'm a little worried that maybe i should have started with step 1?	23:04
ianw	? i maybe wouldn't run the /usr/local/bin script, isn't deleteunreferenced a separate step there (checking ...)	23:04
jeblair	i was assuming that references.db was okay, but i don't know that. (i'm not actually sure how to know that)	23:04
openstackgerrit	Merged openstack-infra/puppet-zuul master: Split zuul and requirements install https://review.openstack.org/511637	23:05
ianw	hmm, that one i regenerated yesterday afternoon	23:05
jeblair	ianw: okay, assuming that's okay, then i think we've done step1 and step2	23:06
openstackstatus	fungi: finished sending ok	23:06
openstackgerrit	Merged openstack-infra/shade master: Image should be optional https://review.openstack.org/511299	23:06
jeblair	ianw: there is no step 3	23:06
clarkb	can we serve the read write volume befor ereleasing it?	23:07
jeblair	ianw: should i try running "reprepro update" now?	23:07
ianw	jeblair: i would just run "k5start -t -f /etc/reprepro.keytab service/reprepro -- reprepro --confdir /etc/reprepro/ubuntu update" by hand	23:07
ianw	yeah, that :)	23:07
jeblair	clarkb: yeah, but we're not there yet i don't think	23:07
openstackgerrit	Merged openstack-infra/shade master: Add method to set bootable flag on volumes https://review.openstack.org/502479	23:07
clarkb	gotcha	23:07
ianw	oh with a -VVVV	23:07
openstackgerrit	Merged openstack-infra/shade master: Allow domain_id for roles https://review.openstack.org/496992	23:07
openstackgerrit	Merged openstack-infra/shade master: Move role normalization to normalize.py https://review.openstack.org/500170	23:07
ianw	hopefully it does something other than peg at 100% cpu saying nothing	23:07
jeblair	okay, i will copy the db files from the local disk into afs, delete *.old (i still have them on the local disk), then run that.	23:08
jeblair	ianw: reprepro --confdir /etc/reprepro/ubuntu -VVVV update	23:10
jeblair	look right?	23:10
ianw	yep	23:10
*** wolverineav has quit IRC		23:10
*** wolverineav has joined #openstack-infra		23:10
mordred	pabelanger, Shrews: remote: https://review.openstack.org/511643 Release 1.24.0 of shade ... patch submitted to cut new release of shade	23:11
jeblair	good news! it did not hang.	23:12
*** bobh has joined #openstack-infra		23:12
jeblair	the bad news: File "pool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb" is already registered with different checksums!	23:12
mordred	jeblair: grumble	23:12
clarkb	that is a neat trick	23:12
ianw	haha, i knew it was a Farnsworth "good news, everybody!"	23:13
jeblair	http://paste.openstack.org/show/623503/	23:14
*** rwsu has joined #openstack-infra		23:14
jeblair	the "expected" values match what's in afs	23:15
clarkb	pabelanger: first recheck of that ssh debugging change didn't fail, trying again	23:15
*** hongbin has quit IRC		23:15
dmsimard	clarkb: did you get your answer for base-test ?	23:15
dmsimard	clarkb: I figure we could just create an adhoc no-op job based off of base-test if there wasn't any.	23:16
clarkb	dmsimard: ya we need to push a change to eg ozj once things merge to rebase to base-test	23:16
clarkb	jeblair: that imply the got: side is what is already registered?	23:16
jeblair	clarkb: ya... i'm trying to extract a text version of the checksum db to examine	23:17
*** aeng has joined #openstack-infra		23:18
jeblair	pool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb :1:1c15ec44003064eb9e664462f764e98aa5e9d36c :2:e4ca9514867498531f1feea0d081c1d3df8d91d9b8bc0f353315e9a9a362e2a2 d7159bf89cc9df87ba64db43c7d8bd1a 1669873	23:18
ianw	the sizes aren't even close?	23:18
jeblair	that also seems to match "expected"	23:19
jeblair	what does "got" mean?	23:19
jeblair	from whence was it "got"?	23:19
clarkb	dmsimard: the update to base-test is in the gate now, so once that merges we just push a chang eto ozj or whatever to use base-test in some test (probably a test that runs against ozj)	23:19
*** felipemonteiro has joined #openstack-infra		23:19
clarkb	jeblair: perhaps got is what the upstream mirror gave it?	23:19
dmsimard	clarkb: right, I won't be at a keyboard for a while. If we want to test it soon, someone else can do it.	23:20
clarkb	dmsimard: ok I'll push one up then	23:20
jeblair	what is our upstream?	23:20
dmsimard	clarkb: thanks!	23:20
ianw	I think it means "I read the info from disk and got this value, so that's what i expect, but the database told me this other value"?	23:20
jeblair	the lines before that were	23:21
jeblair	processing updates for 'trusty-security\|main\|amd64'	23:21
jeblair	reading '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_trusty-security_main_amd64_Packages'	23:21
jeblair	so was 'got' from that file?	23:21
jeblair	yes	23:22
jeblair	the values in that file match "got"	23:22
openstackgerrit	Merged openstack-infra/project-config master: Test ARA report generation only on failure in base-test https://review.openstack.org/511624	23:22
*** Apoorva has joined #openstack-infra		23:23
ianw	jeblair: it seems to be the wrong size ... see one i download from upstream in /tmp	23:23
jeblair	ianw: yeah, that matches the index	23:23
jeblair	so the file we have in our archive is wrong	23:23
ianw	so maybe we just do this over and over, copying in the wrong stuff?	23:23
pabelanger	mordred: I really like the idea of an abandon pipeline, that's a great idea.	23:24
jeblair	or if we remove that file, and remove its entry from checksums.db, will it re-download it and add it?	23:24
openstackgerrit	Clark Boylan proposed openstack-infra/openstack-zuul-jobs master: Reparent ozj integration jobs to base-test for testing https://review.openstack.org/511646	23:24
ianw	it's probably a lot easier to just wget it in than fiddle the db?	23:24
clarkb	there is the ara test I think	23:24
tonyb	Befoer I write one is the a tool that will take a repo name and grovel around in the zuul(v2) config data and list all the jobs that will be run? (bonus points if it can check for branch exclusions)	23:24
jeblair	ianw: i think if we wget it, we still have to fiddle the db to add the newly correct entry to checksums	23:25
fungi	tonyb: not only project and target branch but also changed files in the diff can determine which jobs will run	23:25
jeblair	ianw: there are commands to both remove a single entry from checsums, as well as add/update one.	23:25
pabelanger	clarkb: cool, I'll add some rechecks to that too. We might also want to setup autohold of the job too	23:26
fungi	tonyb: as well as the pipeline into which the ref is enqueued	23:26
pabelanger	and caught up on backscroll	23:26
ianw	jeblair: should we just delete the file and rerun the update, and see if it just downloads it?	23:26
tonyb	fungi: That is true but for my use case today I don't think that matters	23:26
ianw	maybe it can recover from that	23:26
ianw	if not, move on to replacing	23:27
tonyb	fungi: Hmm I will need to consider the pipeline	23:27
jeblair	ianw: i'm pretty sure we need a db fiddle either way, cause i think what's happening here is comparision of checksums.db with package list. i don't know if it's going out to actual files at all.	23:27
jeblair	ianw: so i think we should either remove the file and remove the checksums.db entry; or replace the file and replace the checksums.db entry.	23:27
pabelanger	mordred: 511643 has some errors	23:28
jeblair	ianw: i'm hopeful that if we did the second thing, it would auto-correct. i have no basis other than hope for that though.	23:28
EmilienM	I know you're very busy but if someone can ping me when release-tarball jobs are kick-off again, thanks a lot	23:28
ianw	jeblair: i'm just hoping it stat()s the file or something (i have no idea). i think start simple, remove the file from disk and try update, see what happens	23:28
ianw	just reading up on the checksum remove cmds now	23:29
*** Swami has quit IRC		23:29
clarkb	ianw: not to completely distrct you from the ubuntu mirror but do you know if there is a fix for http://logs.openstack.org/46/511646/1/infra-check/base-integration-centos-7/910a17b/job-output.txt.gz#_2017-10-12_23_27_37_155946 pushed up yet?	23:29
jeblair	ianw: okay, i'm happy to try 1) remove file; rerun. then if that fails, 2) also remove checksum; rerun.	23:29
mordred	pabelanger: so it does :)	23:29
*** caphrim007 has joined #openstack-infra		23:29
clarkb	ianw: need to check ansible_default_ipv6 is defined instead of ansible_default_ipv6.address is defined	23:29
*** aeng has quit IRC		23:30
jeblair	ianw: i'll wait for you to finish reading before i execute.	23:30
fungi	EmilienM: they should be safe to run now. i know there were several tripleo releases i need to reenqueue but was hoping someone on the release team could put together a list of all releases that need reenqueuing besides those so i can do them all in one batch (per my e-mail to the dev list)	23:30
ianw	jeblair: ++ on that plan	23:30
jeblair	tonyb: what's the use case (curiosity)	23:30
*** caphrim007_ has joined #openstack-infra		23:30
ianw	clarkb: ahh, no i haven't. does that explain the occasional errors?	23:30
clarkb	ianw: I think so	23:30
clarkb	ianw: should I go ahead and push a patc or do you want to?	23:30
ianw	that would be nice, i wasn't looking forward to debugging that	23:30
ianw	i can, let me pull it up	23:31
tonyb	jeblair: I want a list of all the jobs (and nodes) that tripleo run to help understand the impact of keeping stable/newaton around for longer	23:31
ianw	clarkb: that's weird though, i thought that variable was always defined, and just blank	23:31
openstackgerrit	Monty Taylor proposed openstack-infra/project-config master: Let v2 publish shade releases again https://review.openstack.org/511649	23:31
mordred	pabelanger: ^^	23:31
ianw	i couldn't find it documented though, maybe i didn't look hard enough	23:31
clarkb	ianw: seems to imply its not defined if no ipv6	23:31
tonyb	jeblair: if it were 1 or 2 repos I'd just do it by hand but ...	23:31
mordred	pabelanger: we need to re-enable publish-to-pypi for shade in v2 :)	23:32
pabelanger	clarkb: I've added auto-hold for gate-infra-puppet-apply-3-centos-7 on nodepool.o.o	23:32
jeblair	tonyb: gotcha. fwiw, i expect us to have a rest api in zuulv3 in a couple of months that would help with this sort of thing.	23:32
tonyb	jeblair: \o/	23:32
pabelanger	mordred: ack	23:32
clarkb	tonyb: globally tripleo was ~1/3 of all jobs run when I last checked	23:32
mordred	clarkb, jeblair, ianw: if you have a sec - https://review.openstack.org/511649 is needed for us to cut a shade release	23:32
jeblair	ianw: no change after rming the file	23:32
pabelanger	mordred: do we need to disable in zuulv3?	23:33
jeblair	ianw: will proceed to checksums.db surgery	23:33
mordred	pabelanger: no - we have those pipelines disabled in v3 anyway	23:33
tonyb	clarkb: Wow	23:33
clarkb	tonyb: er not jobs run	23:33
clarkb	tonyb: sorry it was a cpu time calculation	23:33
clarkb	triple owas 1/3 of all cpu usage	23:33
tonyb	clarkb: Ahh okay. that's less shocking ;P	23:34
pabelanger	mordred: kk	23:34
*** caphrim007 has quit IRC		23:34
ianw	clarkb: do you know if ansible is a "&&" or a "&" ? i.e. is "- ansible_default_ipv6 is defined" then "- ansible_default_ipv6.address is defined" going to bail too?	23:34
openstackgerrit	Lin Yang proposed openstack-infra/project-config master: Add OpenStack client check to python-rsdclient https://review.openstack.org/511650	23:35
ianw	cause i'm sure i saw in the no routable address that being blank	23:35
clarkb	ianw: I'm not sure if it will short circuit	23:35
clarkb	dmsimard: pabelanger mordred ^ do you know?	23:35
jeblair	ianw: it is now doing more things.	23:35
jeblair	looks like it's actually downloading package files.	23:36
ianw	\o/	23:36
pabelanger	clarkb: looking	23:36
jeblair	reprepro --confdir /etc/reprepro/ubuntu -VVVV _forget pool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb	23:36
jeblair	was the command i ran to drop the checksums.db entry for that, btw	23:36
*** thorst has joined #openstack-infra		23:37
dmsimard	ianw: I don't understand the question, probably missing context	23:37
ianw	when:	23:37
ianw	- ansible_default_ipv6 is defined	23:37
ianw	- ansible_default_ipv6.address is defined	23:37
ianw	dmsimard: ^ does that work when ansible_default_ipv6 is not defined at all is basically the question	23:37
dmsimard	If you're only interested in the second condition, it should work by itself without the first one but I'd test it first to make sure	23:38
clarkb	dmsimard: we think the first is required because of http://logs.openstack.org/46/511646/1/infra-check/base-integration-centos-7/910a17b/job-output.txt.gz#_2017-10-12_23_27_37_155946	23:38
ianw	dmsimard: emperical evidence shows it doesn't, but i agree i thought it would too :)	23:39
*** bobh has quit IRC		23:39
mnaser	but is it possible that ansible_default_ipv6 is defined but without and address?	23:39
ianw	mnaser: i'm pretty sure i saw that with ipv6 but no routable address	23:39
pabelanger	ianw: clarkb: we might want to use nodepool.public_ipv6	23:39
pabelanger	which we setup in inventory with zuul	23:40
clarkb	mnaser: the log message is "ansible_default_ipv6' is undefined"	23:40
clarkb	mnaser: implying its the root var that does not exist	23:40
pabelanger	then you can when: nodepool.public_ipv6	23:40
dmsimard	ianw, clarkb: I have a sandbox on my laptop just to keep testing this sort of junk with conditionals and other things. It's never straightforward :(	23:40
pabelanger	http://logs.openstack.org/69/511069/1/infra-check/project-config-nodepool/d1f74c9/zuul-info/host-info.ubuntu-xenial.yaml	23:40
pabelanger	ansible_default_ipv6: {}	23:41
mnaser	clarkb: which is why i'm saying if its possible that "ansible_default_ipv6" is defined with "ansible_default_ipv6.address" undefined. if that's the case, the second conditional can be dropped	23:41
ianw	pabelanger: that's cool ... i mean this "default_ipv6" should be exactly what we want to express, as it seems to only put a routable ipv6 in there	23:41
mnaser	but i guess pabelanger just confirmed it can be	23:41
openstackgerrit	Merged openstack-infra/project-config master: Let v2 publish shade releases again https://review.openstack.org/511649	23:41
pabelanger	so, need to check if ansible_default_ipv6 is empty	23:42
pabelanger	which means, no ipv6	23:42
clarkb	I think the two checks ianw has above should cover all cases	23:42
pabelanger	or, nodepool.public_ipv6	23:42
pabelanger	which is str	23:42
*** thorst has quit IRC		23:42
clarkb	its just a matter of knowing if ansible short circuits or not	23:42
clarkb	I guess we might also want to check that the address is not empty	23:42
mnaser	and until the new release on ansible changes the behaviour too (ha, ha, ha :-P)	23:42
ianw	clarkb: such confusion! i think i will propose an update to the ansible doc page when we figure this out	23:43
clarkb	mnaser: indeed	23:43
*** aeng has joined #openstack-infra		23:43
* mnaser goes back to getting pdfs to accountant who is unable to unzip an archived file		23:43
ianw	mnaser: just fax them	23:44
mnaser	so his specific request: please attach every single pdf into the email without a zip file, because that stuff is complicated ..	23:44
clarkb	fungi: is the undercloud/tmp delete done? I'm not seeing it in ps	23:45
pabelanger	clarkb: ianw: you can use with_dict: ansible_default_ipv6	23:45
pabelanger	then {{ item.address }}	23:45
pabelanger	and should do the right thing	23:45
*** markvoelker has quit IRC		23:46
*** gouthamr has joined #openstack-infra		23:46
ianw	and check if that's defined?	23:46
pabelanger	ya, or \| default({})	23:46
openstackgerrit	Merged openstack-infra/system-config master: Remove npm / rubygem crontab entries https://review.openstack.org/473911	23:46
*** bobh has joined #openstack-infra		23:47
EmilienM	fungi: ack, thx for the update	23:48
*** tosky has quit IRC		23:49
fungi	clarkb: still running. it's in a window of the root screen session there	23:49
fungi	i've cycled to that window now, you should see it if you attach	23:50
clarkb	fungi: thanks	23:50
clarkb	fungi: that is the log archive maintenance script?	23:51
fungi	nope	23:51
*** wolverineav has quit IRC		23:51
fungi	when you pointed out there was a missing subdirectory level, i went back to the original set of three patterns to delete and added the addional */	23:51
ianw	why did i start looking : https://github.com/ansible/ansible/issues/23675	23:51
clarkb	apparently screen -x maintains different windows on different attaches	23:51
fungi	oh neat	23:51
fungi	well, anyway, it's one of the three windows under that root screen session	23:52
fungi	the other two are the v3 log deletion and the 2-week expiration	23:53
clarkb	yup I see it now	23:53
jeblair	ianw: reprepro finished	23:54
jeblair	the file i deletex exists now and is the correct size	23:55
ianw	yay team!	23:55
jeblair	818 files lost their last reference.	23:55
jeblair	(dumpunreferenced lists such files, use deleteunreferenced to delete them.)	23:55
pabelanger	great work	23:55
jeblair	okay, what do we want to do next?	23:55
jeblair	clarkb: i think you suggested that we switch one or more mirrors to serving from the rw volume, yeah?	23:56
jeblair	that's basically a single character apache config change	23:56
clarkb	jeblair: ya and then make sure that jobs are happy wit hthe mirror before we commit to it via vos release	23:56
pabelanger	we should increase timeout kill value too, I don't thing 30m is enough time	23:56
pabelanger	maybe make it 90	23:56
clarkb	jeblair: ianw do we want to consider rerunning reprepro again and see it mostly noop?	23:57
jeblair	pabelanger: can you write that change? let's get it merged before we turn cron back on	23:57
ianw	i would, run the full /usr/local/bin script	23:57
pabelanger	sure	23:57
jeblair	ianw: that will do the vos release which i don't want to do just yet	23:57
clarkb	fungi: looking at ps I'm worried that that rm has been spending all of its time globbing things?	23:57
jeblair	ianw: but i can do the rest of the reprepro steps there	23:57
ianw	oh, yeah, with that commented out, and maybe the timeout commented out to	23:57
jeblair	ianw: so maybe deleteunreferenced next?	23:57
clarkb	fungi: we might need to run that through find too instead?	23:58
clarkb	fungi: strace seems to agree that its just sitting there for the most part	23:58
ianw	jeblair: i think so, since better to know what happens now than when cron hits it	23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!