Monday, 2019-01-07

*** bobh has quit IRC		00:10
*** martinkennelly has quit IRC		00:23
*** hwoarang_ has joined #openstack-infra		01:33
*** hwoarang has quit IRC		01:34
*** wolverineav has joined #openstack-infra		01:39
*** bhavikdbavishi has joined #openstack-infra		01:47
*** wolverineav has quit IRC		01:57
*** bobh has joined #openstack-infra		01:57
openstackgerrit	zhouxinyong proposed openstack/diskimage-builder master: Delete the duplicate words in 50-zipl https://review.openstack.org/628815	02:02
*** bobh has quit IRC		02:05
*** jamesmcarthur has joined #openstack-infra		02:24
*** bhavikdbavishi has quit IRC		02:37
*** hongbin has joined #openstack-infra		02:47
*** jamesmcarthur has quit IRC		02:50
*** whoami-rajat has joined #openstack-infra		02:52
*** psachin has joined #openstack-infra		02:59
*** hwoarang has joined #openstack-infra		03:10
*** hwoarang_ has quit IRC		03:12
*** wolverineav has joined #openstack-infra		03:19
openstackgerrit	zhurong proposed openstack-infra/project-config master: Retire murano-deployment https://review.openstack.org/628850	03:26
*** hongbin has quit IRC		03:39
openstackgerrit	Merged openstack-infra/project-config master: Add 'Review-Priority' for Zaqar repos https://review.openstack.org/628323	03:46
*** jamesmcarthur has joined #openstack-infra		03:51
*** bhavikdbavishi has joined #openstack-infra		03:55
*** ramishra has joined #openstack-infra		04:01
*** bhavikdbavishi has quit IRC		04:02
*** bhavikdbavishi has joined #openstack-infra		04:02
*** jamesmcarthur has quit IRC		04:18
*** wolverineav has quit IRC		04:23
*** udesale has joined #openstack-infra		04:33
*** bobh_ has joined #openstack-infra		04:38
openstackgerrit	zhurong proposed openstack-infra/project-config master: Retire murano-deployment https://review.openstack.org/628850	05:14
*** jamesmcarthur has joined #openstack-infra		05:19
*** jamesmcarthur has quit IRC		05:23
*** bobh_ has quit IRC		05:29
*** ykarel has joined #openstack-infra		05:49
*** diablo_rojo has joined #openstack-infra		05:50
*** bobh_ has joined #openstack-infra		05:51
*** bobh_ has quit IRC		05:56
*** hwoarang_ has joined #openstack-infra		06:02
*** hwoarang has quit IRC		06:03
*** bhavikdbavishi has quit IRC		06:05
*** wolverineav has joined #openstack-infra		06:07
*** armax has quit IRC		06:11
*** wolverineav has quit IRC		06:12
*** bobh_ has joined #openstack-infra		06:41
*** jtomasek has joined #openstack-infra		06:53
*** rcernin has quit IRC		06:56
*** apetrich has joined #openstack-infra		07:06
*** AJaeger has quit IRC		07:11
openstackgerrit	zhurong proposed openstack-infra/project-config master: Retire murano-deployment https://review.openstack.org/628850	07:11
*** AJaeger has joined #openstack-infra		07:20
*** bhavikdbavishi has joined #openstack-infra		07:24
*** dpawlik has joined #openstack-infra		07:29
*** ykarel is now known as ykarel\|lunch		07:29
*** bobh_ has quit IRC		07:34
*** adriancz has joined #openstack-infra		07:35
*** agopi_ has joined #openstack-infra		07:37
*** rpittau has joined #openstack-infra		07:38
*** agopi_ is now known as agopi		07:40
*** tosky has joined #openstack-infra		07:45
*** yamamoto has joined #openstack-infra		07:56
*** yamamoto has quit IRC		07:58
*** ginopc has joined #openstack-infra		07:59
*** diablo_rojo has quit IRC		08:02
*** aojea has joined #openstack-infra		08:08
*** bobh_ has joined #openstack-infra		08:09
*** pcaruana has joined #openstack-infra		08:14
*** agopi has quit IRC		08:18
*** kjackal has joined #openstack-infra		08:22
*** ykarel\|lunch is now known as ykarel		08:23
*** pcaruana has quit IRC		08:24
*** rascasoft has joined #openstack-infra		08:25
*** xek has joined #openstack-infra		08:39
*** yamamoto has joined #openstack-infra		08:42
*** rcarrillocruz has joined #openstack-infra		08:45
*** pcaruana has joined #openstack-infra		08:46
*** bobh_ has quit IRC		08:55
*** jpich has joined #openstack-infra		08:57
*** yamamoto has quit IRC		09:05
*** ykarel has quit IRC		09:06
*** ykarel has joined #openstack-infra		09:06
*** bobh_ has joined #openstack-infra		09:16
*** gfidente has joined #openstack-infra		09:27
*** shardy has joined #openstack-infra		09:31
*** owalsh_ is now known as owalsh		09:35
*** derekh has joined #openstack-infra		09:36
*** ssbarnea\|bkp2 has joined #openstack-infra		09:37
*** ssbarnea has quit IRC		09:39
*** wolverineav has joined #openstack-infra		09:49
*** wolverineav has quit IRC		09:53
*** yamamoto has joined #openstack-infra		09:53
*** jaosorior has joined #openstack-infra		10:03
*** dtantsur\|afk is now known as dtantsur		10:08
*** agopi has joined #openstack-infra		10:17
*** roman_g has joined #openstack-infra		10:18
*** bobh_ has quit IRC		10:19
*** agopi is now known as agopi_		10:20
*** agopi_ is now known as agopi		10:21
*** bhavikdbavishi has quit IRC		10:21
*** gfidente has quit IRC		10:27
openstackgerrit	Merged openstack-infra/zuul master: dict_object.keys() is not required for in operator https://review.openstack.org/621482	10:27
*** sshnaidm\|off is now known as sshnaidm		10:30
*** yamamoto has quit IRC		10:35
*** arxcruz\|brb is now known as arxcruz		10:35
openstackgerrit	Merged openstack/ptgbot master: Pin irc module to 15.1.1 to avoid import error https://review.openstack.org/626906	10:36
openstackgerrit	Merged openstack/ptgbot master: Generate PTGbot index page dynamically https://review.openstack.org/626907	10:37
*** mpeterson has quit IRC		10:40
*** mpeterson has joined #openstack-infra		10:42
*** mpeterson has quit IRC		10:42
*** yamamoto has joined #openstack-infra		10:45
*** mpeterson has joined #openstack-infra		10:54
*** udesale has quit IRC		10:54
*** gfidente has joined #openstack-infra		11:02
*** mpeterson has quit IRC		11:05
*** mpeterson has joined #openstack-infra		11:06
*** pbourke has quit IRC		11:07
*** pbourke has joined #openstack-infra		11:09
*** pcaruana has quit IRC		11:11
*** pcaruana has joined #openstack-infra		11:16
ssbarnea\|bkp2	infra-root: need review on https://review.openstack.org/#/c/625576/ -- to undo the damaging unsafe umask on src folder	11:19
jhesketh	ssbarnea\|bkp2: lgtm	11:24
*** kjackal has quit IRC		11:34
*** tobias-urdin is now known as tobias-urdin_afk		11:34
*** kjackal has joined #openstack-infra		11:34
*** rpittau is now known as rpittau\|lunch		11:41
ssbarnea\|bkp2	jhesketh: thanks. this one is more important than it appears as it has some unexpected side effects, also preventing us to fix the random post timeouts.	11:45
ssbarnea\|bkp2	... not to mention that the original approach did not make much sense anyway :D	11:45
jhesketh	right, I'm not comfortable pushing it through without a second review though as it's a trusted repo. Additionally given its potential affects it should probably be baby sat which sadly it's late here	11:46
ssbarnea\|bkp2	jhesketh: totally agree. i am sure we will get others from US.	11:48
*** yamamoto has quit IRC		11:50
*** yamamoto has joined #openstack-infra		11:53
*** bhavikdbavishi has joined #openstack-infra		11:54
*** agopi has quit IRC		11:56
*** dpawlik has quit IRC		11:58
*** tobias-urdin_afk is now known as tobias-urdin		11:59
* SpamapS tries treating his insomnia with updating the shade package in Debian. It's not working... but at least shade will be up to date. :-P		12:12
openstackgerrit	Slawek Kaplonski proposed openstack-infra/openstack-zuul-jobs master: Remove tempest-dsvm-neutron-scenario-linuxbridge job definition https://review.openstack.org/628942	12:13
*** bobh_ has joined #openstack-infra		12:16
*** dkehn has quit IRC		12:21
*** bobh_ has quit IRC		12:21
*** agopi has joined #openstack-infra		12:24
*** dpawlik has joined #openstack-infra		12:25
frickler	corvus: did you decide when to do the k8s walkthrough? seems like tomorrow is preferred, but it would be good if I could know for sure soon, so I can plan my evenings a bit	12:26
*** yamamoto has quit IRC		12:26
*** yamamoto has joined #openstack-infra		12:29
*** ykarel is now known as ykarel\|afk		12:30
*** zigo has quit IRC		12:31
SpamapS	frickler: IIRC k8s walkthrough is on Tuesday (US/Pacific), not sure the exact time.	12:32
*** bhavikdbavishi has quit IRC		12:32
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Extract common config parsing for ProviderConfig https://review.openstack.org/625094	12:34
*** zhangfei has joined #openstack-infra		12:37
*** rlandy has joined #openstack-infra		12:42
frickler	SpamapS: after the infra meeting is the first option on the ethercalc, wednesday the second one, so that would match my assumption	12:51
SpamapS	ya	12:52
*** yamamoto has quit IRC		12:52
frickler	infra-root: system-config-run-base-ansible-devel seems to be failing since friday, "ERROR! Unexpected Exception, this is probably a bug: 'PlaybookCLI' object has no attribute 'options'" http://logs.openstack.org/16/628216/4/check/system-config-run-base-ansible-devel/2fbf1ef/job-output.txt.gz#_2019-01-04_17_21_05_827683	12:52
*** evrardjp has joined #openstack-infra		12:54
*** rpittau\|lunch is now known as rpittau		12:56
*** evrardjp has quit IRC		12:59
*** evrardjp has joined #openstack-infra		12:59
*** jhesketh has quit IRC		13:05
*** jhesketh has joined #openstack-infra		13:06
frickler	looks like this might be the culprit https://github.com/ansible/ansible/commit/afdbb0d9d5bebb91f632f0d4a1364de5393ba17a	13:08
*** kaiokmo has joined #openstack-infra		13:09
frickler	possibly a genuine upstream bug instead of some bad use of internals on our side	13:09
mordred	frickler: yah - I think so - we don't use the python api for running playbooks	13:10
mordred	frickler: oh - wait - I think we might be using CLI options in callback plugins	13:11
mordred	frickler: ok. I;m going to stop responding until I've had more coffee	13:12
mordred	frickler: that's us running ansible in a job to run base.yaml - so isn't a zuul-side error, so yeah, I'd say that's most likely to be an ansible bug	13:13
*** ykarel\|afk is now known as ykarel		13:15
*** boden has joined #openstack-infra		13:18
*** wolverineav has joined #openstack-infra		13:25
openstackgerrit	Merged openstack-infra/project-config master: Add fetch-output to base job https://review.openstack.org/511851	13:26
*** trown\|outtypewww is now known as trown		13:27
*** tmorin has joined #openstack-infra		13:29
*** wolverineav has quit IRC		13:29
*** boden has quit IRC		13:30
tmorin	hi infraroot: would someone be available to freeze and open access to a CI devstack VM, to allow me to investigate a failure I can't manage to reproduce locally ?	13:30
*** dave-mccowan has joined #openstack-infra		13:31
tmorin	(the job is legacy-tempest-dsvm-networking-bgpvpn-bagpipe for change 626895,3)	13:31
*** tmorin has left #openstack-infra		13:31
*** tmorin has joined #openstack-infra		13:31
*** ykarel is now known as ykarel\|away		13:32
tmorin	infra-root ^ (perhaps more likely to ping someone than 'infraroot')	13:32
tmorin	thanks in advance	13:32
SpamapS	mordred: ty for the shade +A .. I'm getting 1.30.0 into Debian and finally fixing the RC that kept it out of all releases (usr/bin/shade-inventory in python- and python3-)	13:32
SpamapS	having to remember some incantations though	13:32
SpamapS	mordred: quite nice to drop all of those old build-deps for openstacksdk.	13:33
*** udesale has joined #openstack-infra		13:36
frickler	tmorin: looking	13:38
mordred	SpamapS: ++ - I enjoyed your tweet about the packaging this morning	13:38
tmorin	frickler: thanks! (I 'recheck' minutes ago, seeing a failure with many jobs)	13:39
SpamapS	mordred: yeah, I just wish I was asleep instead of tweeting to your delight. ;)	13:40
tmorin	"ERROR Unable to find playbook /var/lib/zuul/builds/cf13bfca241e43f890868f4c09ce963c/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base/post-ssh.yaml" -> this seems unusual, although totally unrelated to the issue I'm trying to investigate	13:41
*** jamesmcarthur has joined #openstack-infra		13:41
tmorin	frickler: the job isn't yet running, it's currently in "queued" status	13:42
mnaser	i just ran into the same error that tmorin ran into for my job	13:42
frickler	tmorin: hmm, it looks like we may have broken project-config	13:42
mnaser	i mean	13:43
mnaser	the last merge into project-config is	13:43
frickler	mordred: this relates to https://review.openstack.org/#/c/511851/ ^^	13:43
mnaser	yeah	13:43
frickler	networking-bgpvpn-dsvm-functional networking-bgpvpn-dsvm-functional : ERROR Unable to find playbook /var/lib/zuul/builds/551b054a32fb425caa81d8ef5ba4ca2d/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base/post-ssh.yaml	13:43
openstackgerrit	Jens Harbott (frickler) proposed openstack-infra/project-config master: Revert "Add fetch-output to base job" https://review.openstack.org/628967	13:44
frickler	infra-root: ^^ probably need this until we have a better fix	13:44
mnaser	we probably missed a post-ssh.yaml somewhere	13:45
mnaser	yep	13:46
mnaser	we missed the one	13:46
mnaser	inside 'base'	13:46
frickler	this rather looks like a bad rebase to me	13:46
*** smarcet has joined #openstack-infra		13:46
*** jcoufal has joined #openstack-infra		13:46
frickler	meh, the revert fails zuul, too	13:47
*** ykarel\|away has quit IRC		13:47
mnaser	we might need an infra-root to force merge	13:47
frickler	mnaser: do you think you have an easy fix? otherwise I'd force-merge the revet	13:47
fungi	mordred: pabelanger: you've been working on the fetch-output stuff, right? ^	13:47
mnaser	i mean	13:47
mnaser	i see a post-ssh.yaml that is referenced in the base job	13:47
mordred	uhoh. yeah - let's revert asap	13:48
frickler	fungi: yes, https://review.openstack.org/511851 merged 20 minutes ago, seems it broke everything	13:48
openstackgerrit	Mohammed Naser proposed openstack-infra/project-config master: fetch-output: switch base to use post.yaml https://review.openstack.org/628970	13:48
mnaser	mordred: frickler ^	13:48
mnaser	i think that's the issue	13:49
mordred	mnaser: wow, how did we miss that	13:49
mordred	we could just slam that one in instead - I think we're going to need a force-merge in either case	13:49
mnaser	ill leave the decision of revert OR force-merge that up to whoever	13:49
frickler	it will probably fail in zuul, too, yes	13:49
frickler	mordred: I'll leave it to you to clean up your patch, if you don't mind	13:50
mordred	kk. I'm going to force-merge mnaser's patch	13:50
openstackgerrit	Merged openstack-infra/project-config master: fetch-output: switch base to use post.yaml https://review.openstack.org/628970	13:50
mnaser	ok lets see if we unbroke things now	13:50
*** kgiusti has joined #openstack-infra		13:51
mordred	oh. sigh. I only set it on base-minimal before. sorry about that!	13:51
mnaser	we know who's buying first round of drinks next ptg!	13:51
mordred	yes!	13:52
fungi	ibm? ;)	13:52
evrardjp	haha	13:52
mordred	thanks ginny	13:52
fungi	sorry, too soon	13:53
mnaser	i see jobs starting up	13:53
mnaser	i think we're good	13:53
*** boden has joined #openstack-infra		13:53
fungi	thanks for spotting that!	13:54
mordred	frickler, fungi, mnaser: since you all have fetch-output stuff paged in- there's a patch that adds functional tests too: https://review.openstack.org/#/c/628731/	13:56
frickler	tmorin: o.k., now your job should be running properly and will be held once/if it fails. which ssh key shall I use for access?	13:57
mnaser	mordred: btw, i would suggest getting in the habit of using 'is' instead of '\|'	13:57
mnaser	i think \| is being dropped soon, it throws deprecation stuff all over runs in newer versions of ansible	13:58
tmorin	frickler: to be able to troubleshoot the problematic OVS state, I need to prevent tempest cleanup steps from happening	13:58
mordred	mnaser: ah - good call	13:58
tmorin	frickler: I already tweaked the tempest test to do a sleep(10000) at the right place	13:59
mnaser	mordred: it also reads easier sometimes, 'log_directory is not changed'	13:59
mnaser	but anyways, that's just a nit :)	13:59
tmorin	frickler: so we don't need to wait for a failure to freeze it	13:59
mordred	mnaser: totally. I'm going to do a folllowup that changes those - and also the ones that I cargo-culted from :)	13:59
tmorin	frickler: sending you my pub key by PM	13:59
mordred	mnaser: does \| succeeded go to is succeeded too?	13:59
mnaser	yep mordred	14:00
frickler	tmorin: oh, o.k., then I need to dig into finding the correct node before it is being listed as held	14:00
mnaser	https://docs.ansible.com/ansible/latest/user_guide/playbooks_tests.html#test-syntax -- "As of Ansible 2.5, using a jinja test as a filter will generate a warning."	14:01
*** whoami-rajat has quit IRC		14:01
openstackgerrit	Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use is instead of \| for tests https://review.openstack.org/628973	14:02
mordred	mnaser: ^^	14:02
openstackgerrit	sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Update laravel legacy jobs for PHP 7.x https://review.openstack.org/628974	14:04
*** tmorin has quit IRC		14:04
mnaser	mordred: small tweak :)	14:06
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-base-jobs master: Add fetch-output to base jobs https://review.openstack.org/628975	14:07
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-base-jobs master: Ignore errors on ssh key removal https://review.openstack.org/628976	14:07
mordred	mnaser: ah - yes - thanks!	14:07
mordred	SpamapS: ^^ you use the zuul-base-jobs repo I believe? you might be interested in the stack there	14:07
openstackgerrit	Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use is instead of \| for tests https://review.openstack.org/628973	14:10
mordred	mnaser: fixed. thanks! that's much better	14:10
*** e0ne has joined #openstack-infra		14:12
*** ykarel\|away has joined #openstack-infra		14:12
pabelanger	fungi: it doesn't look like 511851 was staged properly via bast-test first, which broke things on merge	14:12
smarcet	fungi: mordred: morning , when u have some time please review https://review.openstack.org/#/c/628974/ thx !	14:13
*** irclogbot_1 has quit IRC		14:14
mordred	pabelanger: nah - we did base-test - it's just when we applied it to base, I only updated the base-minimal job description and not also the base job description	14:15
mordred	silly me	14:15
mordred	it makes me really wish zuul would consider a job definition that references a non-existent playbook as a config error (although it would be a bit expensive for it to do so)	14:16
*** jamesmcarthur has quit IRC		14:18
dhellmann	config-core: this change to add a release job to the placement repo is a prereq for including it in the stein release, and the deadline for that is this week. Please add it to your review queue for the next couple of days. https://review.openstack.org/#/c/628240/	14:19
*** e0ne has quit IRC		14:19
pabelanger	mordred: ah, I see 628780 now	14:24
AJaeger	mordred, mnaser, could you review https://review.openstack.org/#/c/628240/ , please?	14:25
*** xek has quit IRC		14:27
*** xek has joined #openstack-infra		14:28
mordred	AJaeger: done	14:30
mordred	pabelanger: yeah - oh well, we can't be perfect :)	14:30
AJaeger	mordred: something still broken, see https://review.openstack.org/#/c/628731/	14:30
AJaeger	ERROR Unable to find playbook /var/lib/zuul/builds/45e70d41476244b1b1ebdcea184fd3d8/trusted/project_0/git.openstack.org/openstack-infra/project-config/playbooks/base-minimal/post.yaml	14:31
mordred	oh ugh	14:31
mordred	yeah - one sec	14:31
mordred	that'll only be affecting those tests and not everybody	14:31
mordred	but still ugh	14:31
dhellmann	config-core: this patch to add new repos to the sahara project is also a prereq for a governance change for which this week's milestone is the deadline. Please add it to your review queue for early this week. https://review.openstack.org/#/c/628209/	14:32
openstackgerrit	Monty Taylor proposed openstack-infra/project-config master: Rename base-minimal/post-ssh to base-minimal/post https://review.openstack.org/628983	14:33
mordred	pabelanger, AJaeger: ^^	14:33
mordred	pabelanger: maybe I should have started with getting your base job refactor in first :)	14:33
mordred	AJaeger: wildcards work in gerritbot config? (re: 628209)	14:34
mordred	AJaeger: ah - so they do. neat!	14:35
*** smarcet has quit IRC		14:36
*** needsleep is now known as TheJulia		14:36
AJaeger	mordred: yes	14:36
*** smarcet has joined #openstack-infra		14:37
ssbarnea\|bkp2	mordred: pabelanger : any chance to get https://review.openstack.org/#/c/625576/ merged?	14:37
AJaeger	ssbarnea\|bkp2: let's first fix the current breakage, please ;)	14:38
AJaeger	ssbarnea\|bkp2: your change has a high risk for failure and is untested...	14:38
pabelanger	ssbarnea\|bkp2: I haven't followed but thought talks with clarkb and corvus was that is expected behavor for legacy reasons	14:39
pabelanger	I'd much rather see jobs stop using zuul-cloner	14:39
pabelanger	and delete that role	14:39
*** irclogbot_1 has joined #openstack-infra		14:39
mordred	yeah - I'm worried about the fallout from making that change - it's super hard to test or figure out what might break	14:39
*** fuentess has joined #openstack-infra		14:39
dhellmann	mordred : thank you!	14:40
openstackgerrit	Merged openstack-infra/openstack-zuul-jobs master: Update laravel legacy jobs for PHP 7.x https://review.openstack.org/628974	14:40
ssbarnea\|bkp2	mordred: that chmod was evil in the first place, i do understand that we need to be careful about that change, but this does not mean we shouldn't repair the damage just because of potential risk. right?	14:41
*** jamesmcarthur has joined #openstack-infra		14:42
ssbarnea\|bkp2	i think we should be able to findout in less than 30min if something important is affected and address it (doing it a job level, a local fix, or even a revert)	14:43
openstackgerrit	Merged openstack-infra/project-config master: Add publish-to-pypi template to placement https://review.openstack.org/628240	14:44
fungi	ssbarnea\|bkp2: is your supposition that the issue https://review.openstack.org/512285 attempted to fix by adding that is no longer present?	14:45
*** ykarel\|away is now known as ykarel		14:45
*** smarcet has quit IRC		14:45
*** e0ne has joined #openstack-infra		14:46
ssbarnea\|bkp2	fungi: yep, this was my impression that we no longer need the hardlinking support.	14:46
*** e0ne has quit IRC		14:47
fungi	i think the discussion of the original problem starts at http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-15.log.html#t2017-10-15T12:24:45	14:48
openstackgerrit	Merged openstack-infra/project-config master: Add new Sahara repositories for split plugins https://review.openstack.org/628209	14:49
openstackgerrit	Merged openstack-infra/project-config master: Rename base-minimal/post-ssh to base-minimal/post https://review.openstack.org/628983	14:50
tobias-urdin	fungi: a while ago ianw_pto helped fix the forge.puppet.com credentials for the "openstack" account and iiuc it should be stored as a secret in zuul now but I can't seem to find it after browsing all repos and commit history, do you know where I could find it?	14:50
*** anteaya has joined #openstack-infra		14:50
AJaeger	tobias-urdin: might be only in the "private" hiera secret store	14:52
anteaya	so some confused third party ci person doesn't yet understand the purpose of an example wiki page: https://wiki.openstack.org/w/index.php?title=ThirdPartySystems/Example&diff=next&oldid=56443	14:52
anteaya	I'll change the text back and email the account cc'ing the infra email list	14:53
fungi	thanks anteaya	14:53
anteaya	I don't know how successful I'll be, so thought I'd let folks know	14:53
fungi	tobias-urdin: is there a job uploading files to puppetforge? i can likely trace backwards from whatever's using it	14:54
fungi	anteaya: if you're a wiki admin you should be able to roll back the edit	14:54
tobias-urdin	fungi: no, I'm in the process of building that but the missing piece is what the secret was named so I can access it	14:54
*** smarcet has joined #openstack-infra		14:55
fungi	anteaya: if you see an "undo" link next to it in the list at https://wiki.openstack.org/w/index.php?title=ThirdPartySystems&action=history then that should hopefully do what you need	14:56
*** rlandy is now known as rlandy\|rover		14:56
*** smarcet has quit IRC		14:56
*** zhangfei has quit IRC		14:57
fungi	tobias-urdin: i'll look in the usual places and see if we've recorded it somewhere	14:57
tobias-urdin	fungi: thanks!	14:57
fungi	tobias-urdin: i have a password for an openstackinfra user on puppetforge with a comment: This is for the "openstack" namespace. This used to be owned by a single user, but at request of PTL was assigned to infra. user names map 1:1 with emails so we could not reuse above. Note this + email gets filtered into a folder on infra-root imap server	14:59
fungi	oh, sorry, that comment was for the openstack user, the openstackinfra user is noted as unused	14:59
tobias-urdin	that's probably it, we had some issues since there was an openstack-infra namespace already that used the infra-root email	15:01
*** irclogbot_1 has quit IRC		15:01
anteaya	fungi: seems I am a wiki admin, I found a rollback option	15:01
tobias-urdin	the namespace for modules is the username of the account so openstack/glance (which we upload from git.o.org/openstack/puppet-glance) puppet module is the "openstack" account on forge.puppet.com	15:02
fungi	anteaya: great! i thought i recalled you being one, but didn't have time to check your account perms	15:02
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Clean up command sockets on stop https://review.openstack.org/628990	15:03
anteaya	fungi: you have a great memory	15:04
fungi	tobias-urdin: makes sense. so anyway, if the intent is that the credentials for that account are going to be managed centrally (not by the puppet-openstack team), then we likely need the playbook which will use it added to the openstack-infra/project-config repo. if you want to propose that with a placeholder for the zuul secret, i can upload a revision of the change which includes the encrypted	15:04
fungi	password	15:04
mordred	fungi: yeah - I think the idea was to have a central "upload to puppetforge" job sort of like upload-pypi	15:08
mordred	iirc	15:08
* mordred wasn't 100% paying attention		15:08
tobias-urdin	https://review.openstack.org/#/q/topic:forge-publish+(status:open+OR+status:merged)	15:09
*** irclogbot_1 has joined #openstack-infra		15:09
tobias-urdin	^ is what i have so far, where https://review.openstack.org/#/c/627573/ is the one that will use the secret	15:10
anteaya	also looking at that contributor's edits on the wiki, that username is a 9 digit number and it appears their co-worker is a 10 digit number username	15:10
anteaya	feels weird to me	15:10
anteaya	but we don't have a rule about wiki usernames	15:10
* anteaya is looking at https://wiki.openstack.org/w/index.php?title=ThirdPartySystems&diff=cur&oldid=167461		15:11
fungi	anteaya: i've seen so many weird things from our community i've started to question them less and less often ;)	15:11
anteaya	okay, thanks for the sanity check	15:11
fungi	tobias-urdin: great! so i guess we need to add a secret in zuul.d/secrets.yaml and then add it to the secrets list for the release-openstack-puppet job description	15:12
clarkb	ssbarnea\|bkp2: pabelanger mordred ya my concern is that that role is for preserving "legacy" behavior	15:13
clarkb	so updating it to not be legacy is potentially dangerous	15:14
ssbarnea\|bkp2	fungi: re chmod on src, if i understand well the risk is only around legacy jobs, right? probably something like http://codesearch.openstack.org/?q=cp%20-l&i=nope&files=&repos= being the affected stuff, right?	15:14
clarkb	instead you should stop using zuul cloner	15:14
fungi	what shall we call it? "puppetforge_credentials" looks like it would be most consistent with the other entries, but there is a lot of variation in there	15:14
fungi	tobias-urdin: ^	15:14
fungi	ssbarnea\|bkp2: lots of things perform a hardlinking copy under the hood... virtualenv (via tox or otherwise), git clone file:///... and more	15:16
ssbarnea\|bkp2	fungi: hardlinking limitation due to file permissions applies only original owner is different than the one trying to do the hardlinking.	15:17
ssbarnea\|bkp2	fungi: if everything is run under the same user, there is no need to "hack" default umask	15:17
fungi	and testing for it becomes challenging because if it's between different filesystems then those tools have fallbacks which will do something other than hardlink (because that's not possible) but we have different filesystem layouts in different providers	15:17
ssbarnea\|bkp2	and AFAIK, anything under ~zuul should be owned by zuul and no other users.	15:17
tobias-urdin	fungi: sounds good to me, but i'll leave it entirely up to infra :)	15:18
*** dpawlik has quit IRC		15:18
fungi	ssbarnea\|bkp2: sure, top offenders will be jobs which use multiple accounts, such as devstack and devstack-based functional test jobs	15:18
ssbarnea\|bkp2	fungi: even jobs with multiple accounts have workarounds that do not need the umask hack: just adding the stack user to the zuul group is fixing the hardlinking issue	15:18
ssbarnea\|bkp2	we only need to avoid o+w	15:19
fungi	ssbarnea\|bkp2: that happened? on old branches of devstack too?	15:19
ssbarnea\|bkp2	fungi: i don't know that, i am only trying to eradicate the umask while trying to address risks it receive.	15:21
*** openstackgerrit has quit IRC		15:22
*** openstackgerrit has joined #openstack-infra		15:23
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Ensure command_socket is last thing to close https://review.openstack.org/628995	15:23
openstackgerrit	Sorin Sbarnea proposed openstack-infra/devstack-gate master: Replace cp -l with --reflink=auto https://review.openstack.org/628998	15:31
dmsimard	btw, tagged ara 0.16.2 for release. No new features -- addresses warnings and a deprecation notice.	15:34
*** e0ne has joined #openstack-infra		15:42
*** bobh_ has joined #openstack-infra		15:43
*** bobh_ has quit IRC		15:47
*** e0ne has quit IRC		15:50
openstackgerrit	Will Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix https://review.openstack.org/629006	15:56
smcginnis	dmsimard: Not sure if you saw, but the ara release jobs failed. Well, just the docs publishing. Looks like the readthedocs config isn't fully set up.	15:57
dmsimard	smcginnis: Oh? I'll look -- thanks	15:57
clarkb	rtd changed their api and now we cant remotely trigger updates	15:58
clarkb	iirc	15:58
dmsimard	clarkb: even with the new webhook stuff ?	15:58
dmsimard	smcginnis: do you have a link handy ?	15:58
smcginnis	Yeah, let me track that down.	15:58
smcginnis	dmsimard: http://lists.openstack.org/pipermail/release-job-failures/2019-January/001015.html	15:59
smcginnis	dmsimard: The logs don't actually have much useful info though - http://logs.openstack.org/a3/a31a4f8cbbc84f3d96efb0ffc533621190fdde46/release/trigger-readthedocs-webhook/d500e56/job-output.txt.gz#_2019-01-07_15_25_57_416880	16:00
clarkb	dmsimard: yes they broke it after the webhook stuff. ianw filed abug with them	16:00
dmsimard	smcginnis: hmmm, I probably need to put the webhook_id elsewhere than https://github.com/openstack/ara/blob/master/zuul.d/layout.yaml#L3	16:00
dmsimard	clarkb: ok, I'll trigger it manually for now	16:01
*** bnemec is now known as stackymcstackfac		16:04
*** stackymcstackfac is now known as bnemec		16:05
*** fuentess has quit IRC		16:06
*** smarcet has joined #openstack-infra		16:12
*** jamesmcarthur has quit IRC		16:17
*** jamesmcarthur_ has joined #openstack-infra		16:17
*** pcaruana has quit IRC		16:21
*** jamesmcarthur_ has quit IRC		16:22
*** jamesmcarthur has joined #openstack-infra		16:22
*** wolverineav has joined #openstack-infra		16:24
fungi	heading out to run some lunch errands, but will be back as soon as i can	16:28
*** whoami-rajat has joined #openstack-infra		16:28
*** wolverineav has quit IRC		16:29
*** smarcet has quit IRC		16:29
*** psachin has quit IRC		16:31
*** smarcet has joined #openstack-infra		16:32
openstackgerrit	Will Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix https://review.openstack.org/629006	16:33
*** ramishra has quit IRC		16:34
*** armax has joined #openstack-infra		16:39
*** udesale has quit IRC		16:40
*** bobh_ has joined #openstack-infra		16:41
clarkb	ssbarnea\|bkp2: would it be reasonable for your ansible use case to add a chmodprior to running ansible? or cloning the roles/playbooks to another location first? I'd really like to avoid breaking people in that legacy state by changing the expectations around that. We recognize there were bugs and deficiencies with that setup which is why we've replaced it entirely in zuulv3.	16:42
clarkb	logan-: fwiw host_id: 704a6e4d2ae61ad0bf113de69b52cb6414dadb287241358ebaf1c7b2 shows up in a couple jobs that exhibit weird ipv4 connectivity between test nodes in limestone cloud. http://logs.openstack.org/31/628731/7/check/openstack-infra-multinode-integration-ubuntu-trusty/35c4982/zuul-info/inventory.yaml is one example with ovs vxlan tunnel over ipv4 not working and	16:46
clarkb	http://logs.openstack.org/00/628200/1/gate/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/660080e/job-output.txt is a tripleo job unable to ssh from one node to the other for ansible over ipv4	16:46
clarkb	(still a msall data set so unfortunately don't have much more info than that)	16:46
*** fuentess has joined #openstack-infra		16:47
clarkb	dmsimard: https://github.com/rtfd/readthedocs.org/issues/4986 is the rtd bug	16:48
clarkb	still open but looks to be accepted and should be fixed in the future	16:48
*** smarcet has quit IRC		16:51
*** ginopc has quit IRC		16:53
openstackgerrit	Will Szumski proposed openstack-dev/pbr master: Do not globally replace path prefix https://review.openstack.org/629006	16:54
*** wolverineav has joined #openstack-infra		16:54
*** notmyname has quit IRC		16:55
openstackgerrit	Merged openstack-infra/system-config master: Turn on the future parser for all zuul mergers https://review.openstack.org/616295	16:56
*** wolverineav has quit IRC		16:56
*** notmyname has joined #openstack-infra		16:57
*** smarcet has joined #openstack-infra		16:58
*** rfolco has quit IRC		16:58
*** rfolco has joined #openstack-infra		16:59
*** rpittau has quit IRC		17:00
*** smarcet has quit IRC		17:02
*** aojea has quit IRC		17:04
openstackgerrit	Merged openstack-infra/zuul master: Fix ignored but tracked .keep file https://review.openstack.org/621391	17:06
openstackgerrit	Merged openstack-infra/system-config master: Turn on the future parser for zuul.openstack.org https://review.openstack.org/616296	17:06
clarkb	infra-root ^ I'll be watching that	17:08
*** wolverineav has joined #openstack-infra		17:08
mordred	clarkb: ++	17:09
clarkb	fungi: the changes after that one are for lists.kata and lists.open, any chance you want to approve and/or babysit those with me today?	17:09
*** dustinc has joined #openstack-infra		17:11
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul-jobs master: [WIP] upload-pypi: add option to register packages https://review.openstack.org/629018	17:19
ssbarnea\|bkp2	clarkb: so not fixing broken zuul-cloner role because risks and because is supposed to go away. Still, I believe we do include it in 99% of jobs based on https://github.com/openstack-infra/project-config/blob/ab0cb430d130aaed3e6d333384c4d6d8740040fe/playbooks/base/pre.yaml#L38 -- fetch-zuul-cloner does more than fetching it, is also messing the src folder.	17:20
ssbarnea\|bkp2	clarkb: can we make the role execution conditional? ... a step towards deprecation.	17:21
ssbarnea\|bkp2	or better, to run the "umark" task only on old jobs. do we have a variable we can use to add a condition?	17:21
clarkb	ssbarnea\|bkp2: right I don't think we want to fix the frozen deprecated process. Instead we want to convert jobs to the new process. I think the plan for that was to make a different base job for legacy jobs. And the main base job wouldn't run the zuul cloner shim setup	17:22
clarkb	ssbarnea\|bkp2: but that process ran into problems because jobs were not marked legacy but had legacy dependencies. Probably what we can do is notify the dev list of the change happening then make the switch in a couple weeks	17:23
clarkb	pabelanger: mordred ^ I think you had a lot more of thatp aged in than I did	17:23
clarkb	corvus: http://paste.openstack.org/show/740460/ shows up on zuul node puppet runs. Can I just clean that up out of band to make the puppet logs quieter?	17:25
clarkb	mordred: ^ you amy know too as it seems related to the zuul dashboard hoating	17:25
*** trown is now known as trown\|lunch		17:26
mordred	clarkb: uhoh. what did I do?	17:26
corvus	yeah, i think that's very old status page	17:26
ssbarnea\|bkp2	smells like chicken and the egg kind of issue. never fixing "base" because someone is/may be using it. How about having a "base2" base to use. At least this approach allow people to adopt newer base(s) without having to worry about some project being too slow.	17:26
clarkb	ssbarnea\|bkp2: yes I'm saying we should fix base, just give people some time to chagne their jobs if they need to first	17:26
clarkb	ssbarnea\|bkp2: basically the thing that has been missing is someone to drive and coordinate that work. There isn't a lack of wanting to do it	17:27
mordred	clarkb: I agree - I think we shoudl fix base to not run fetch-zuul-cloner and have that only run in legacy-base	17:27
ssbarnea\|bkp2	mordred: i like that idea. i was considering using a "when" confition on inclusion of this role.	17:28
ssbarnea\|bkp2	import_role with when works ok, I only do not know what to check for (how to know which job is old/new)	17:29
clarkb	ssbarnea\|bkp2: I don't think we want to make it complicated like that. Instead rely on zuul's job inheritance to simplify it for us. Use base if your job is not legacy and legacy-base if it is	17:29
clarkb	(I don't know if legacy-base exists yet)	17:29
clarkb	base won't have the zuul cloner shim setup in it and legacy-base will	17:30
clarkb	corvus: ok I'll clean those dirs up	17:30
*** rf0lc0 has joined #openstack-infra		17:30
*** jpich has quit IRC		17:31
*** rf0lc0 has quit IRC		17:31
mordred	yes - legacy-base exists	17:32
mordred	and it runs fetch-zuul-cloner	17:32
*** rfolco has quit IRC		17:33
mordred	so I think we should be able to warn people, give it a little time, then remove fetch-zuul-cloner from base	17:33
mordred	all of the autoconverted legacy jobs use legacy-base	17:33
clarkb	++	17:33
openstackgerrit	Sorin Sbarnea proposed openstack-infra/project-config master: WIP: attempt removal of fetch-zuul-cloner from base job https://review.openstack.org/629019	17:34
ssbarnea\|bkp2	mordred: clarkb ^^ so my WIP test above could be the future removal. I already crearted a DNM change that tests its effect on tripleo https://review.openstack.org/#/c/625680/	17:38
clarkb	ssbarnea\|bkp2: I don't think the depends on will work because project-config changes must be merged first before they can be tested	17:38
mordred	ssbarnea\|bkp2: that unfortunately won't work ...	17:38
mordred	yeah - what clarkb said	17:38
clarkb	what we can do is troll logstash for zuul-cloner usage	17:39
mordred	(this will get better with pabelanger's base job refactor, but that hasn't landed yet)	17:39
clarkb	and cross check that with people using base and not legacy-base	17:39
*** ykarel has quit IRC		17:39
clarkb	(I'm not sure how much work that is)	17:39
mordred	clarkb: we could also work through pushing in pabelanger's base refactor so that we could test zuul-cloner removal with depends-on	17:39
clarkb	zuul01 ran with futureparser and seems happy	17:39
ssbarnea\|bkp2	i don't know how you can detect its usage, as the role runs as part of every hob.	17:40
clarkb	ssbarnea\|bkp2: you'd be looking for jobs that actually run the zuul-cloner command later	17:40
clarkb	if they don't then they don't need the shim	17:40
clarkb	mordred: ya	17:40
mordred	clarkb: https://review.openstack.org/#/q/status:open+topic:base-minimal-jobs - I was going to work on landing that once done with the zuulv3-output topic	17:40
*** rfolco has joined #openstack-infra		17:42
*** rkukura has joined #openstack-infra		17:42
*** gyee has joined #openstack-infra		17:45
*** dtantsur is now known as dtantsur\|afk		17:45
AJaeger	mordred: feel free to ping if you need review for zuulv3-output, I suggest we wrap that up without waiting another 12 months ;)	17:46
mordred	AJaeger: totally. It's my main priority job-wise at the moment	17:47
jrosser	i've seen mirror errors like this a fair few times now http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/job-output.txt.gz#_2019-01-07_17_29_52_855806	17:47
clarkb	AJaeger: mordred is there much else to do after the base job switch? I guess convert a job or two and point people to that setup?	17:48
*** wolverineav has quit IRC		17:49
clarkb	jrosser: http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/python-oslo-utils-lang-3.37.1-0.20181012100734.6e0b90b.el7.noarch.rpm exists but the 3.39 rpm your job requests does not. Those are caching proxies so you've tried to install a package that does not exist on the remote I think	17:49
clarkb	jrosser: its possible we've got cache mismatches between indexes and actualy contents? except the index I see shows 3.37 and that rpm exists	17:50
AJaeger	clarkb: https://review.openstack.org/#/q/topic:zuulv3-output+status:open is the list of open reviews - mordred is rebasing one after the other and merging them...	17:50
ssbarnea\|bkp2	http://codesearch.openstack.org/?q=bin%2Fzuul-cloner&i=nope&files=&repos= reports ~444 occurences, which makes me hopless regarding reaching 0 in my lifetime. neither using logstash does not make me more optimistic	17:50
mordred	clarkb: yeah - I think the next bit after that stack would be converting some of our main jobs - like devstack and build-sphinx	17:50
openstackgerrit	Merged openstack-infra/puppet-ptgbot master: No longer needs room map in configuration https://review.openstack.org/625619	17:51
mordred	as examples - but also to get large portions of our system converted over	17:51
mordred	I think we need to wait a bit before we can convert things like unittests base though	17:51
mordred	we'll need to give deployers a deprecation period to get fetch-output into their base jobs	17:52
clarkb	jrosser: https://trunk.rdoproject.org/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/ seems to only have 3.37 too. Any idea where 3.39 is coming from?	17:52
mordred	but I thnik converting openstack-only base jobs should be straightforward	17:52
clarkb	ssbarnea\|bkp2: ya thats why I suggest we notify people via the list. Maybe include that codesearch link and a link to a logstash query	17:52
clarkb	ssbarnea\|bkp2: but let them fix it themselves	17:52
clarkb	ssbarnea\|bkp2: most of them should be using legacy-base if they came from the job conversion we did from jjb	17:53
mordred	yah. I'd say using zuul-cloner and not using legacy-base should be an unsupported config	17:54
ssbarnea\|bkp2	somehow i find the concept of full switch from any v1 to v2 as something very hard to achieve. Can't we find a more progressive adoption approach? maybe we can enable/disable features/changes using a versioned variable. "zuul_job_version", which is implicitly 1 but we could defined a bigger value in our jobs. Ans we can have tasks that run based on the value of this.	17:56
jrosser	clarkb: i think that is a dependany of some other package, its easier to parse here but there are a bunch it can't find http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/logs/ara-report/result/f84f0c97-d4d7-416b-8972-b3abcaa08833/	17:56
clarkb	ssbarnea\|bkp2: we crossed that bridge when zuul v3 decided to be incompatible with < 3	17:56
clarkb	ssbarnea\|bkp2: I think for the future we should be more careful, but zuul-cloner is an artifact of zuul v2 which we do not run anymore and we should stop using it entirely	17:57
clarkb	jrosser: I wonder if RDO is updating packages before putting all of the dependencies in place	17:57
clarkb	jrosser: dmsimard may know	17:57
openstackgerrit	Merged openstack-infra/puppet-ptgbot master: No longer build index page in puppet-ptgbot https://review.openstack.org/626911	17:57
AJaeger	http://zuul.openstack.org/status/change/628731,7 is waiting since 3 hours for Debian nodes ;( Any problems with Debian nodes ?	17:58
dmsimard	clarkb, jrosser: having lunch right now, I'll be able to check in a few	17:59
jrosser	no worries - i'm travelling home soon but i've seen a fair few of these so its worth a bit of a dig later	17:59
clarkb	AJaeger: \| 0001544620 \| ovh-bhs1 \| debian-stretch \| a3145fd4-7ce5-4a5a-be55-6b2407f00cac \| 158.69.65.196 \| 2607:5300:201:2000::335 \| ready \| 00:02:24:55 \| locked \|	18:00
clarkb	AJaeger: it is odd that a ready node would be locked for 2.5 hours	18:00
*** ykarel has joined #openstack-infra		18:00
*** diablo_rojo has joined #openstack-infra		18:00
*** derekh has quit IRC		18:00
clarkb	\| 0001547894 \| rax-dfw \| debian-stretch \| 2f0bd8e3-135e-47d5-b28c-8cde74f3af85 \| None \| None \| building \| 00:00:00:17 \| locked \| is new node building and \| 0001547866 \| inap-mtl01 \| debian-stretch \| 15b690bd-0005-46c1-b47a-4047db6ed536 \| 198.72.124.91 \| \| in-use \| 00:00:00:49	18:00
clarkb	\| locked \| is recently used nodes	18:00
clarkb	AJaeger: my guess is that locked ready node is tied to that job	18:01
*** jcoufal has quit IRC		18:01
clarkb	Shrews: ^ is that something you might want to look at?	18:01
logan-	interesting log on that first job you linked clarkb. it looks like it has connectivity across the vxlan but only in one direction?	18:01
clarkb	logan-: no I think it is broken in both directions. We ping the local IP locally and remotely. The pings that succeed are for the local IP	18:02
AJaeger	clarkb: thanks	18:02
clarkb	logan-: that helps us to know if hte interface itself is broken or if it is the tunnel. The local IP pinging implies the tunnel is the issue	18:03
*** jcoufal has joined #openstack-infra		18:04
ssbarnea\|bkp2	clarkb: regarding rdo, if i remember well updating deps doesn't happen before, due to its slow speed. but overall i think the logic changed in time.	18:04
logan-	ah ok, so 172.24.4.1 is on the 'primary' side of the tunnel, and 172.24.4.2 is on secondary	18:05
clarkb	logan-: yup	18:05
*** diablo_rojo has quit IRC		18:05
Shrews	clarkb: i can look, but it's not uncommon for zuul to hold ready node locks for that long	18:06
Shrews	will see if i can track that one down though	18:06
*** ykarel has quit IRC		18:06
clarkb	Shrews: would it be waiting on an executor to be available to run the job? iirc we get nodes before an executor	18:07
clarkb	basicaly what prevents that job from starting if it has a node and is queued. Executor availability is all I can think of right now	18:08
clarkb	ssbarnea\|bkp2: re cloner shim. Basically most jobs that use it will be carried over from our JJB conversion and use legacy-base. There is potential for some jobs to use the shim and parent to base and not legacy-base. If they do this they will break and we can fix them pretty easily by converting to legacy-base. So there might be a short period of brokeness, but comes with straightforward fix. If we	18:11
clarkb	communicate that people can check things before hand (and fix it before hand) then I think we've done a good job there and anyone broken should have a relatively easy path forward for fixing too	18:11
clarkb	ssbarnea\|bkp2: if we want to wait on the base job refactoring that allows us to test more of the base jobs we can do that too which will make the step of auditing whether or not it will break you easier	18:11
*** agopi has quit IRC		18:14
Shrews	clarkb: looks like ovh-bhs1 is at quota, so it's paused trying to handle the most recent request, but frequently getting launch errors. the active request queue is slowly shrinking, but node requests piled up in that queue are delayed (that node is part of one of the requests, but that one still needs 1 more node)	18:15
*** agopi has joined #openstack-infra		18:15
clarkb	ah so this is multinode requests when at quota	18:16
Shrews	clarkb: correct	18:16
clarkb	amorin: ^ hello not sure if you are back from holidays, but we've noticed our quota in bhs1 is lower than we had previously	18:17
clarkb	amorin: do we need to update our configs or is that a bug?	18:17
*** rkukura has quit IRC		18:18
*** agopi has quit IRC		18:18
*** jamesmcarthur_ has joined #openstack-infra		18:19
*** agopi has joined #openstack-infra		18:19
*** jamesmcarthur has quit IRC		18:19
*** trown\|lunch is now known as trown		18:20
Shrews	clarkb: i'm confused by that since we have max-servers as 150 but i count only 36 ovh-bhs1 nodes	18:22
Shrews	something out of sync?	18:22
clarkb	Shrews: yes our quota in bhs1 has been lowered	18:22
clarkb	Shrews: don't know why or how yet, but basically nodepool is operating under the lower quota numbers	18:22
clarkb	Shrews: part of it is we've kept frickler's test nodes around in that cloud so thats ~20 instances	18:23
*** jamesmcarthur_ has quit IRC		18:23
clarkb	but that still doesn't account for the full decrease.	18:23
*** jamesmcarthur has joined #openstack-infra		18:23
Shrews	hrm	18:23
clarkb	frickler: ^ maybe we should clean those up now that it has been almost a month?	18:23
*** jamesmcarthur has quit IRC		18:24
*** jamesmcarthur has joined #openstack-infra		18:24
*** jamesmcarthur has quit IRC		18:24
*** jamesmcarthur has joined #openstack-infra		18:25
*** jamesmcarthur has quit IRC		18:26
*** jamesmcarthur_ has joined #openstack-infra		18:26
*** agopi_ has joined #openstack-infra		18:26
*** jamesmcarthur_ has quit IRC		18:27
*** agopi has quit IRC		18:29
*** dkehn has joined #openstack-infra		18:29
*** jamesmcarthur has joined #openstack-infra		18:32
*** jamesmcarthur_ has joined #openstack-infra		18:34
*** jamesmcarthur has quit IRC		18:34
fungi	clarkb: sure, once i'm caught up, happy to keep an eye on puppeting of the listservers	18:37
clarkb	fungi: great, want to approve the first one when you are in a good spot for that? I'll be around all day so can switch to that when you are ready	18:37
*** jamesmcarthur_ has quit IRC		18:40
fungi	clarkb: "first one" being 628216?	18:43
clarkb	fungi: yup	18:43
fungi	i have it queued up in gertty now, will approve shortly, sure. thanks!	18:44
*** rkukura has joined #openstack-infra		18:45
*** wolverineav has joined #openstack-infra		18:48
clarkb	I'll pop out for a bit now then should be back by the time it can be merged and applied	18:50
*** wolverineav has quit IRC		18:52
dmsimard	infra-root: apache on mirror02.regionone.limestone.openstack.org is complaining from read-only filesystem when trying to write cache header files, ex: http://paste.openstack.org/raw/740463/	18:54
clarkb	dmsimard: any errors in dmesg or the kernel log indicating why the fs is ro?	18:55
fungi	dmesg ring buffer has been spammed by filesystem errors	18:55
dmsimard	clarkb: there are ext4-fs errors in dmesg but I'm trying to find when it started or if there are any afs errors	18:56
dmsimard	also, yes, what fungi said -- actual non-fs errors have been cycled out	18:56
clarkb	dmsimard: that cache isnt on afs it is "local"	18:56
logan-	verified hv is not full /dev/mapper/SYSVG-ROOT 465G 251G 191G 57% /	18:56
dmsimard	clarkb: oh, oops	18:57
fungi	yeah, we have a lvm2 logical volume for it, from a vg on top of pv /dev/vdb1	18:57
dmsimard	I'm working my way up the apache logs, so far this has been ongoing since at least jan 2	18:58
dmsimard	apache logs go as far back as dec 31st and there was read only errors already	18:59
fungi	syslog only has a one-week retention there, yeah	18:59
dmsimard	[Mon Dec 31 06:35:17.172980 2018] [cache_disk:warn] [pid 2883:tid 140135773492992] (30)Read-only file system	18:59
fungi	oldest syslog entry is:	18:59
fungi	Dec 31 06:25:09 mirror02 kernel: [2073564.770768] EXT4-fs error (device dm-0): ext4_lookup:1606: inode #3367: comm updatedb.mlocat: deleted inode referenced: 8251	18:59
clarkb	fwiw I dont think that ie the source of jrosser's 404 as weseem to proxy without caching	19:00
fungi	right, i thnik this would account for proxy performance degredation	19:00
fungi	not 404	19:00
jrosser	It may be worth looking in logstash for similar because I have a hunch all of these I’ve seen were in limestone	19:01
dmsimard	yes and no	19:01
dmsimard	I found that issue as part of troubleshooting the 404 :)	19:01
fungi	right, we ought to fix it regardless	19:02
clarkb	ya shoudl be fixed	19:02
dmsimard	mnaser pointed out that we may have been pulling a stale .repo file but it doesn't appear to be that way	19:02
clarkb	dmsimard: no I checked directly and it seems to line up wuth our mirror	19:02
clarkb	my hunch ia the rdo repo updating packages with new deps before adding the new deps	19:03
dmsimard	I was in #openstack-ansible attempting to help, tl;dr is that OSA is setting up a recent repo (built today) which contains the packages that are 404'ing but for some reason yum is looking at this old 1 month old repo	19:03
dmsimard	There is a 404 on http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/a4/4b/a44b7fc6344c56410b94f2e69ef07fd4b48abb6a_b72eb3dd/python2-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.noarch.rpm	19:03
dmsimard	because that package is actually at https://trunk.rdoproject.org/centos7/90/44/9044841473d3c9a4c70882bfb5ca59f89cf7afa0_6dde040f/python-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.src.rpm	19:04
fungi	looks like that pv is via cinder volume f18e717d-8981-4134-8fe1-57596f7481e4	19:04
dmsimard	or http://mirror.regionone.limestone.openstack.org:8080/rdo/centos7/90/44/9044841473d3c9a4c70882bfb5ca59f89cf7afa0_6dde040f/python-glanceclient-2.15.0-0.20181226110746.c4c92ec.el7.src.rpm	19:04
*** tosky has quit IRC		19:05
fungi	logan-: possible we briefly lost contact with the cinder backend sometime >1week ago? that's usually sufficient for active volumes to go read-only on us	19:05
dmsimard	that /90/44 repository is set up properly by OSA as far as we can tell http://logs.openstack.org/26/628926/8/check/openstack-ansible-functional-centos-7/fec2d75/logs/ara-report/result/fd68c556-12ed-4096-9d07-697951a4b3cf/	19:05
dmsimard	I'm not exactly sure what is going on, would like to address the caching issue and see if we can reproduce	19:06
fungi	if i stop apache2 and openafs services on mirror02.regionone.limestone i should be able to remount those filesystems read-write, but it probably makes more sense to reboot the instance anyway	19:07
fungi	infra-root: objections to an emergency reboot of mirror02.regionone.limestone? ^	19:07
logan-	fungi: maybe when we rebooted hosts to update the kernel for nested virt fixes ceph hung io long enough to kick it into ro.. i can't remember exactly when that was, a couple weeks ago though	19:07
*** smarcet has joined #openstack-infra		19:07
fungi	logan-: that could easily be it. we don't have syslog back that far unfortunately to get a more exact timestamp	19:08
dmsimard	fungi: I think we'd want to do a stop/start to ensure we're on a new process with a brand new volume connection	19:08
clarkb	fungi considering it broken anyeay seems fine. We can disable in nodepool too if we want to be more graceful about it	19:08
fungi	dmsimard: yeah, rebooting the instance will certainly stop and start the processes running in it ;)	19:08
clarkb	dmsimard: can we stop start as users of nova?	19:08
dmsimard	fungi: I mean the KVM process	19:08
clarkb	oh I read it as stop start the qemu process	19:08
fungi	ohhh	19:08
pabelanger	clarkb: mordred: https://review.openstack.org/513506/ removed zuul-cloner from base, I think the issue there was legacy tox jobs there depend on it still. Maybe we just reparent them to legacy base?	19:09
pabelanger	clarkb: mordred: but agree, a heads up to ML about fallout might be a good idea	19:09
fungi	yeah, i can `sudo poweroff` in the instance and then `openstack server boot` it fresh	19:09
dmsimard	I'm not sure to what extent it applies to ceph, I remember iscsi issues that could only be resolved by spinning up a new process and a soft reboot wasn't enough	19:09
fungi	though when we've seen this sort of thing in other clouds in the past, just remounting the filesystems read-write after a good fsck is usually sufficient	19:10
*** wolverineav has joined #openstack-infra		19:10
*** wolverineav has quit IRC		19:10
*** wolverineav has joined #openstack-infra		19:10
*** jamesmcarthur has joined #openstack-infra		19:11
dmsimard	I suppose we should first attempt to remount before considering rebooting	19:11
fungi	well, the outage from a reboot should be brief	19:15
fungi	but we can dial down max-servers first if we want	19:15
clarkb	that is the safest way	19:17
clarkb	but I agree should be short and if we think jobs are broken anyway...	19:17
*** jamesmcarthur has quit IRC		19:18
clarkb	ok I have to pop out for a few minutes. back in a few	19:18
dmsimard	fungi: any of these solutions sound good to me -- I can send a patch for max-servers or we can put nl02 in the emergency file temporarily	19:22
*** shardy has quit IRC		19:24
*** shardy has joined #openstack-infra		19:25
fungi	probably simplest to just do nl02 in emergency and then manually update max-servers on it, then wait for the used count to empty, then we can poweroff the mirror instance and boot it agani	19:30
dmsimard	Was going to add it in the emergency file but saw a .swp file from a minute ago :p	19:31
fungi	that was me	19:31
dmsimard	ok	19:31
*** gfidente has quit IRC		19:32
fungi	#status log temporarily lowered max-servers to 0 in limestone-regionone in preparation for a mirror instance reboot to clear a cinder volume issue	19:32
openstackstatus	fungi: finished logging	19:32
openstackgerrit	sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo https://review.openstack.org/629035	19:32
fungi	i'll keep an eye on that to make sure the current puppet.ansible pulse doesn't re-up it	19:32
fungi	that was the only provider on nl02 booting nodes, btw. the others (citycloud, packethost) were already set to max-servers 0	19:34
*** agopi__ has joined #openstack-infra		19:34
*** smarcet has quit IRC		19:36
*** agopi_ has quit IRC		19:36
clarkb	pabelanger: ^ re packethost have you been able to follow up on using some osa there?	19:36
clarkb	I'm back now too	19:37
pabelanger	clarkb: Yah, they are keen. We are just finishing up the POC with ansible-network.	19:37
clarkb	NICE	19:37
clarkb	er didn't mean the caps there but still nice :)	19:37
pabelanger	clarkb: I think we actually can start pushing up code to review.o.o next week, and start deploying it from zuul	19:38
cloudnull	^ nice	19:38
*** smarcet has joined #openstack-infra		19:38
pabelanger	but yah, OSA works well on stable/rocky	19:38
smarcet	fungi: mordred:clarkb: please review https://review.openstack.org/#/c/629035/ thx	19:40
dmsimard	#status log mirror02.regionone.limestone.openstack.org's filesystem on the additional cinder volume went read only for >1 week (total duration unknown) causing errors when apache was attempting to update it's cache files.	19:41
openstackstatus	dmsimard: finished logging	19:41
openstackgerrit	Merged openstack-infra/openstack-zuul-jobs master: Add fetch-output and ensure-output-dirs tests https://review.openstack.org/628731	19:43
openstackgerrit	Merged openstack-infra/openstack-zuul-jobs master: Use is instead of \| for tests https://review.openstack.org/628973	19:44
clarkb	mordred: ^ fyi	19:44
mordred	clarkb: woot	19:44
mordred	smarcet: zuul is sad about that patch	19:45
smarcet	yes saw it	19:45
smarcet	mordred: will fix sorry about that	19:45
mordred	smarcet: no worries!	19:45
*** jamesmcarthur has joined #openstack-infra		19:47
openstackgerrit	sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo https://review.openstack.org/629035	19:52
*** whoami-rajat has quit IRC		19:58
*** kjackal has quit IRC		20:04
smarcet	mordred:fungi: fixed https://review.openstack.org/#/c/629035/	20:07
AJaeger	mordred: want to abandon https://review.openstack.org/628668 now?	20:08
*** agopi__ has quit IRC		20:09
openstackgerrit	sebastian marcet proposed openstack-infra/openstack-zuul-jobs master: Updated laravel jobs to include php7 repo https://review.openstack.org/629035	20:21
*** bobh_ has quit IRC		20:21
*** imacdonn has joined #openstack-infra		20:22
clarkb	fungi: should I approve https://review.openstack.org/#/c/628216/4 or were you still planning to do it?	20:27
fungi	i've approved it just now and am watching syslog on the server	20:29
clarkb	great	20:30
fungi	last puppet apply there was 19:58:40	20:30
*** e0ne has joined #openstack-infra		20:40
imacdonn	hi guys ... https://review.openstack.org/#/c/612393/ failed in the gate due to a tempest timeout ... must it be rechecked, or is there any shortcut (requeue?)	20:41
clarkb	imacdonn: in general failures have to be rechecked due to the "clean check" requirement	20:44
clarkb	the major exception to this is when we are trying to get bug fixes in for the gate itself and want to expedite that process to avoid unnecessary gate resets	20:44
clarkb	(this is why I keep pushing for people to help debug and fix gate errors)	20:44
imacdonn	clarkb: yeah, I understand that, but it seems like a timeout has a high probability of not being the code's fault ... I think exceptions may have been given in the past, but I don't recall the circumstances	20:45
imacdonn	just seems like a waste of resources to go through check again	20:45
imacdonn	understood, though ... figured it was working asking ;)	20:46
clarkb	imacdonn: what we are trying to avoid there is making it easy for flaky code to go through the gate and merge then be flaky for everyone (we've seen this happen in the past and is one of the reasons for clean check. The other is ensuring that we have relatively up to date results avoiding unnecessary gating)	20:46
clarkb	imacdonn: and yes the flaky gate failures are often not directly related to the specific change that failed.	20:47
clarkb	whcih is why I keep pushing people to identify the failures, track them with elastic rehckec and ideally fix them if we are able	20:47
imacdonn	clarkb: yeah, I get it .... trying to diagnose timeouts on infra that you have little visibility into can be challenging, though ;)	20:48
clarkb	imacdonn: ya one of the frequent steps we have to take is add additional logging around the unhappy code	20:49
fungi	corvus: noticing /var/lib/bind/zones/zuulci.org/zone.db.signed on adns1.opendev.org is (much) older than zone.db, and rndc zonestatus doesn't list a "next resign node" or "next resign time" for it (also says "secure: no"). still digging to see why it's not getting new sigs	20:51
fungi	oh! /etc/bind/keys/zuul-ci.org is empty	20:52
fungi	that could be related ;)	20:52
openstackgerrit	Merged openstack-infra/system-config master: Fix glob for lists.katacontainers.io https://review.openstack.org/628216	20:52
fungi	same on adns1.openstack.org as well	20:53
fungi	i wonder how we got a signed zone for it to begin with	20:53
fungi	er, meant to say /etc/bind/keys/zuulci.org is empty	20:54
*** bobh_ has joined #openstack-infra		20:54
*** jamesmcarthur has quit IRC		20:54
fungi	/etc/bind/keys/zuul-ci.org (with the hyphen) definitely has content	20:55
*** smarcet has quit IRC		20:58
fungi	looks like we're missing a section for the zuulci.org zone in /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o	20:58
*** bobh_ has quit IRC		21:00
fungi	working on getting some added now	21:02
corvus	fungi: since it was registered through csc, it was probably never really signed	21:05
corvus	(cause i think opendev.org was the first csc domain we dnssec'd)	21:05
fungi	makes sense	21:06
fungi	strangely, it has a zone.db.signed file anyway	21:06
*** smarcet has joined #openstack-infra		21:07
clarkb	corvus: before I forget want to followup with the thread about k8s walkthrough with a time selection? Probably want to do that today so people can make time tomorrow if that is whn we are doing it	21:07
corvus	clarkb: will do now	21:07
dmsimard	fungi: need to take off and I'll be back later, it looks like limestone is clear: http://grafana.openstack.org/d/WFOSH5Siz/nodepool-limestone?orgId=1	21:08
fungi	dmsimard: yep, i'll get it rebooted thoroughly here in a but	21:09
fungi	er, in a bit	21:09
*** xek has quit IRC		21:11
fungi	#status log generated and added dnssec keys for zuulci.org to /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o	21:15
openstackstatus	fungi: finished logging	21:15
fungi	hopefully that'll get things rolling	21:16
*** smarcet has quit IRC		21:19
clarkb	fungi: do we need to give the registrar soem of that info?	21:20
clarkb	seems like we had to do that for opendev	21:20
clarkb	fungi: also lists.kata should be puppeting in the next 10-15 minutes I thin	21:21
clarkb	(I've got a shell there now too)(	21:21
*** smarcet has joined #openstack-infra		21:21
fungi	yeah, the last pulse was a no-op but i think it started before the change merged	21:22
fungi	clarkb: we likely need to provide ds records to csc for zuulci.org once everything is confirmed working	21:22
fungi	but i'd hold off doing that until we see the serial update	21:23
*** wolverineav has quit IRC		21:26
*** kgiusti has left #openstack-infra		21:27
clarkb	fungi: and that just affects abuility to verify the signed zone?	21:30
fungi	right	21:30
clarkb	fungi: lists.kata lgtm	21:30
fungi	clarkb: yeah, other than all the deprecation messages it looks to have been a no-op?	21:31
clarkb	yup	21:31
fungi	i'll go ahead and approve the next change now	21:31
clarkb	++	21:31
fungi	and watch lists.o.o accordingly	21:31
*** jcoufal has quit IRC		21:31
clarkb	fungi: and so I understand the dns thing better, csc wasn't syncing the zone because it wasn't properly signed?	21:36
fungi	had nothing to do with csc	21:36
clarkb	oh ns1 and ns2 weren't syncing it from adns1?	21:36
fungi	ns1/ns2.opendev.org were serving old copies of the zone because that's what adns1.opendev.org was providing them	21:36
fungi	adns1.opendev.org had a zone.db.signed file (presumably copied from adns1.openstack.org?) corresponding to before the ns record changes in the current zone.db file	21:37
fungi	and zone.db.signed is what was getting served	21:38
clarkb	got it	21:38
fungi	if it had been updating zone.db.signed on each new zone.db change, that would have worked out fine	21:38
*** jamesmcarthur has joined #openstack-infra		21:39
fungi	but since it had no dnssec keys for the zuulci.org zone, it wasn't able to sign the newer version of the zone so kept serving the old signed one	21:39
fungi	the fix (i'm hoping!) was to run the dnssec-keygen commands from https://docs.openstack.org/infra/system-config/dns.html#adding-a-zone and then insert their contents into /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o	21:41
fungi	if not, at least i'll get to see what else is missing next	21:41
fungi	but yeah, if zone.db.signed gets updated here in a little while, then we can run the dnssec-dsfromkey command there and provide the output to csc	21:42
clarkb	imacdonn: digging itno that failure it seems the main tempest run was fine http://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/job-output.txt.gz#_2019-01-07_18_28_17_260583 then the slowtests run gets really unhappy for about 35 minutes and the job times out	21:43
clarkb	fungi: got it	21:43
fungi	infra-root: i've powered off mirror02.regionone.limestone now that there are no jobs running in that region. i'll get it booted back up and checked out here in a few	21:43
*** e0ne has quit IRC		21:44
*** smarcet has quit IRC		21:44
imacdonn	clarkb: I guess they weren't kidding when they marked them as "slow" :)	21:44
clarkb	imacdonn: dstat shows the node goes relatively idle after the first tempest run too	21:46
clarkb	I think that rules out a memory or cpu or disk issue	21:46
clarkb	imacdonn: http://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/controller/logs/screen-n-cpu.txt.gz#_Jan_07_18_33_34_967897 shows us virtual interface creation failed (and it had ~5 minutes to do that)	21:50
openstackgerrit	Merged openstack-infra/system-config master: Fix glob for lists.o.o https://review.openstack.org/628217	21:51
clarkb	http://logs.openstack.org/93/612393/21/gate/tempest-full-py3/952f7f7/controller/logs/screen-q-agt.txt.gz#_Jan_07_18_28_36_447492 neutron seems to think the device was created properly	21:52
*** auristor has quit IRC		21:53
fungi	wow, neat, mirror02.regionone.limestone is refusing to let me ssh in now, and it's been at least 5 minutes since i issued the server start command. guess i'll check the console	21:54
fungi	lovely, it can't fsck /dev/main/mirror02-cache-apache2 so has dropped to single-user	21:56
imacdonn	clarkb: hmm... message queue issue ?	21:57
logan-	fungi: :( lmk if i can be of any assistance on my end	21:58
clarkb	imacdonn: possibly. Looking at rabbit logs there are some unexpected disconnects but none from the nova-compute pids	21:58
clarkb	imacdonn: do those events go through nova conductor instead?	21:58
*** bnemec has quit IRC		21:58
clarkb	hrm the disconnects all seem to be uwsgi processes so not conductor either	21:59
clarkb	imacdonn: probably need nova and/or neutron to dig into that more. Maybe dansmith or slaweq can help	22:00
clarkb	http://status.openstack.org/elastic-recheck/#1808171 may be related as well	22:01
*** trown is now known as trown\|outtypewww		22:01
*** wolverineav has joined #openstack-infra		22:01
*** wolverineav has quit IRC		22:01
*** wolverineav has joined #openstack-infra		22:01
fungi	logan-: thanks! just getting pulled in lots of directions so taking me a few minutes to dig up credentials	22:02
clarkb	imacdonn: ya that logstash signature matches except for the test name. So we may want to broaden that query	22:02
*** bnemec has joined #openstack-infra		22:02
*** e0ne has joined #openstack-infra		22:02
clarkb	ya lots of hits if I remove the test filter	22:03
fungi	logan-: aha, `console url show ...` did the trick. hooray for people running actual openstack!	22:03
clarkb	imacdonn: I'll go bug neutron	22:03
*** auristor has joined #openstack-infra		22:06
imacdonn	clarkb: thanks! I'll watch there	22:06
fungi	yay! i can ssh into mirror02.regionone.limestone now, after manually rerunning fsck with -y via a root shell on the oob console	22:08
*** rh-jelabarre has quit IRC		22:08
logan-	great	22:08
clarkb	fungi: was it refusing to complete the boot without a fsck?	22:08
fungi	depends on your definition of "refusing," "complete" and "boot" i guess ;)	22:08
fungi	i happily complained that one of the filesystems in /etc/fstab had errors, and then helpfully dropped to a root shell in single-user mode	22:09
fungi	er, it happily complainec	22:09
fungi	something	22:10
fungi	it's getting to that time of day where my typing is even more atrocious than usual	22:10
clarkb	ah	22:11
*** slaweq has quit IRC		22:12
clarkb	fungi: have we reenabled that region in nodepool yet?	22:22
fungi	not yet	22:24
fungi	i'm in a bunch of conversations, trying to finish checking the mirror out	22:24
fungi	apache and afs caches look sane, hitting from a browser	22:26
*** ianw_pto is now known as ianw		22:26
*** boden has quit IRC		22:26
fungi	and the filesystems are mounted and no errors are being reported	22:26
ianw	HNY everyone	22:26
fungi	half-normal yodelling to you too!	22:27
clarkb	ianw: hello	22:28
mordred	hello ianw!	22:29
*** slaweq has joined #openstack-infra		22:30
*** e0ne has quit IRC		22:32
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Update upload-logs to process docs as well https://review.openstack.org/511853	22:32
*** slaweq has quit IRC		22:34
fungi	#status log nl02 has been removed from the emergency maintenance list now that the filesystems on mirror02.regionone.limestone have been repaired and checked out	22:41
openstackstatus	fungi: finished logging	22:41
*** diablo_rojo has joined #openstack-infra		22:43
*** rcernin has joined #openstack-infra		22:45
*** tosky has joined #openstack-infra		22:52
manjeets	clarkb, hi, adding success-comment in pipeline.yaml didn't make it go to CI section	22:54
clarkb	manjeets: I'm not sure then	22:54
fungi	manjeets: can you link to an example review where your ci system added a comment?	22:55
*** eernst has joined #openstack-infra		22:55
manjeets	fungi, https://review.openstack.org/#/c/603501/	22:56
manjeets	Intel SriovTaas CI check comments from	22:56
manjeets	comments from Intel SriovTaas CI check	22:56
openstackgerrit	Merged openstack-infra/zuul master: Add timer for starting_builds https://review.openstack.org/623468	22:57
*** eernst_ has joined #openstack-infra		22:57
*** eernst has quit IRC		22:57
*** tmorin has joined #openstack-infra		22:57
fungi	manjeets: thanks, i think the account display name needs to be adjusted to just "Intel SriovTaas CI" without the "check" on the end of the account name	22:57
manjeets	fungi, i'll try that too thanks !!	22:58
tmorin	frickler: you can now release the CI node you had frozen earlier today to let me debug, I gathered enough information to explore a path to a solution -- many thanks!	22:58
*** eernst_ has quit IRC		22:58
*** eernst has joined #openstack-infra		22:59
fungi	manjeets: you can see the regular expression used to match on the display names for comments at https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/files/gerrit/hideci.js#n19	22:59
fungi	var ciRegex = /^(.* CI\|Jenkins\|Zuul)$/;	22:59
fungi	so if there's something after the "CI" in the name, that will cause it not to match	22:59
fungi	clarkb: puppet is running on lists.o.o now	23:00
fungi	and just finished	23:00
tmorin	( frickler, or anyone else acting as infra-root, the CI node that can be freed is was ubuntu-xenial-inap-mtl01-0001542013 )	23:00
manjeets	fungi I wonder how's this working on here https://review.openstack.org/#/c/629041/	23:00
fungi	clarkb: looks like it was (properly) a no-op?	23:00
manjeets	there's Intel NFV CI check	23:00
fungi	manjeets: "check" is the pipeline name. if you hit the "toggle ci" button at the bottom of the page you'll see the display name for that account is just "Intel NFV CI" with no "check" after it	23:01
*** eernst has quit IRC		23:02
fungi	the "check" part is taken from the job report string, where it says "Build succeeded (check pipeline)."	23:02
manjeets	got it thanks ! fungi for some reason I followed that thinking check is part of name	23:02
manjeets	my bad	23:02
manjeets	fungi, cool that worked https://review.openstack.org/#/c/603501/33	23:02
clarkb	fungi: yup looks like a proper noop. The next services in the list are openstackid. Any idea if we are puppeting those currently? <- smarcet may know too	23:02
manjeets	thanks !	23:03
imacdonn	clarkb: argh, my recheck failed on that thing that looks like an address conflict (ssh timeout)! can't win! http://logs.openstack.org/93/612393/21/check/cinder-tempest-dsvm-lvm-lio-barbican/ebc3a73/	23:03
fungi	clarkb: we are not (currently) puppeting openstackid.org production, while smarcet works through updating openstackid to newer php on openstackid-dev	23:03
fungi	manjeets: great! happy you got it worked out	23:04
clarkb	fungi: should we go ahead and flip openstackid-dev to futureparser now?	23:04
clarkb	then we can flip the switch for prod too and it should work if -dev is happy	23:04
fungi	clarkb: we might want to double-check that it won't complicate what smarcet is doing on openstackid-dev now. also i think he wants to couple this with server rebuilds on xenial (they're still trusty)	23:05
clarkb	fungi: ok, I'm happy either way. We managed to get through a whole chunk of services onto futureparser and we can rebase the list order if necessary from this point forward	23:06
clarkb	I expect its only a small handful of services now	23:06
*** tmorin has quit IRC		23:06
fungi	yeah, might make sense to bump those further up the list or something	23:06
fungi	also http://grafana.openstack.org/d/WFOSH5Siz/nodepool-limestone says 50 nodes in use again, so ansible/puppet has restored the old max-servers	23:07
fungi	if tmorin comes back, that was one of at least 3 nodes held with the same comment, so i'm not sure whether he's done with all of them or just that one	23:10
*** rascasoft has quit IRC		23:10
fungi	mnaser: are you done with the magnum-kubernetes-conformance troubleshooting for that last pair of nodes we held a week ago?	23:10
*** smarcet has joined #openstack-infra		23:13
*** diablo_rojo has quit IRC		23:14
openstackgerrit	Luigi Toscano proposed openstack-infra/project-config master: Basic job and queue definitions for sahara-plugin-* https://review.openstack.org/629068	23:14
fungi	corvus: clarkb: ansible added the keys under /etc/bind/keys/zuulci.org and bind seemed to be aware of them, but didn't update /var/lib/bind/zones/zuulci.org/zone.db.signed until i issued a `sudo rndc loadkeys zuulci.org`	23:19
fungi	though it's still serving that same older soa with the 1526407320 serial from may	23:21
*** ianychoi has quit IRC		23:21
fungi	so all it seems to have done is refreshed the signature on the old zone content?	23:22
fungi	i wonder if we should clear out the contents of /var/lib/bind/zones/zuulci.org and let the signatures get recreated fresh	23:23
fungi	is anybody else having trouble pulling up https://etherpad.openstack.org/ right now?	23:24
openstackgerrit	Merged openstack-infra/openstackid-resources master: Migration to PHP 7.x https://review.openstack.org/616226	23:24
fungi	i can't ssh to it at all	23:24
fungi	via ipv6 or ipv4	23:25
fungi	i wonder if the host just crashed out from under it	23:25
clarkb	ssh not working via ipv4 from here	23:25
clarkb	it does ping, console might say something interesting?	23:25
fungi	jumping into rackspace dashboard, yeah	23:26
fungi	"Loading Console ..."	23:28
fungi	is all it gives me	23:28
fungi	expecting an e-mail from fanatical support to infra-root@o.o in 3... 2... 1...	23:28
fungi	cacti says load average and iowait went through the roof just before it went dead for us	23:35
fungi	i wanted to check whether this was a good opportunity to rebuild it on xenial while it's offline anyway, so i went to pull up the etherpad where we had the list of remaining servers to upgrade... :/	23:39
fungi	anyway, etherpad-dev seems to already be on xenial so i suspect etherpad.o.o is as well	23:39
clarkb	yup	23:40
clarkb	afs* kdc* groups* health status lists.* openstackid* ask graphite pbx refstack static and wiki-dev are the remaining servers	23:41
clarkb	we also need to rm puppetmaster at some point (its still trusty but is replaced with bridge which is bionic)	23:41
clarkb	fungi: ^ that is from my cached copy of the etherpad	23:41
fungi	cacti is still reporting values for snmp polls in the past few minutes, so i was about to say maybe the host is up...	23:42
fungi	"This message is to inform you that the host your cloud server 'etherpad01.openstack.org' resides on alerted our monitoring systems at 23:41 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding what is causing the alert. Please do not access or modify 'etherpad01.openstack.org' during this process. Please reference this incident ID if	23:43
fungi	you need to contact support: CSHD-9wZ2KeoQVvD"	23:43
fungi	#status notice The Etherpad service at https://etherpad.openstack.org/ has been offline since 23:22 UTC due to a hypervisor issue in our service provider, but should hopefully return to service shortly.	23:47
openstackstatus	fungi: sending notice	23:47
*** tosky has quit IRC		23:48
-openstackstatus- NOTICE: The Etherpad service at https://etherpad.openstack.org/ has been offline since 23:22 UTC due to a hypervisor issue in our service provider, but should hopefully return to service shortly.		23:49
corvus	mordred: regardless of whether we think it's ready for gitea; i bet we could do an HA etherpad/percona in k8s.	23:50
openstackstatus	fungi: finished sending notice	23:51
fungi	that would be neat	23:51
clarkb	corvus: the one gotcha there is only one nodejs process can serve all clients for a single pad	23:51
clarkb	corvus: so you have to have some fairely intelligent load balancing happening	23:52
mordred	corvus: ++	23:52
fungi	high-availability doesn't necessarily imply active/active	23:52
clarkb	fungi: ya active standy would be simplest way to do it probably	23:52
fungi	we could get away with active/standby probably (with some data loss at failover)	23:52
mordred	fungi: yah. just being able to restart a new process quickly on a different backend node as things go south would be a nice win	23:52
corvus	yeah, i guess we could have just one etherpad pod which gets auto-rescheduled, or we could probably have a stateful LB.	23:52
mordred	yah. could do both	23:53
corvus	so an active-active percona system with a single re-schedulable etherpad server. piece of cake.	23:53
fungi	looks like the server is back up and responding to ssh again	23:53
mordred	yup	23:53
fungi	23:53:51 up 1 min, 1 user, load average: 0.18, 0.06, 0.01	23:53
corvus	(and by piece of cake, i mean "mordred did all that percona work already" :)	23:54
fungi	heh	23:54
fungi	when you say "percona" you're referring to "Percona Server for MySQL"?	23:55
*** dave-mccowan has quit IRC		23:56
clarkb	I'm guessing galera	23:56
fungi	a la https://github.com/percona/percona-server	23:56
*** jamesmcarthur has quit IRC		23:56
clarkb	whcih does active active active mysql	23:56
mordred	percona xtradb cluster	23:59
mordred	https://review.openstack.org/#/c/626054/	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!