Friday, 2018-10-26

*** agopi\|brb has quit IRC		00:00
*** rpioso is now known as rpioso\|afk		00:10
*** betherly has joined #openstack-infra		00:11
*** betherly has quit IRC		00:15
*** darvon has joined #openstack-infra		00:21
*** betherly has joined #openstack-infra		00:31
*** betherly has quit IRC		00:36
*** agopi\|brb has joined #openstack-infra		00:37
*** agopi\|brb has quit IRC		00:39
*** agopi\|brb has joined #openstack-infra		00:39
*** longkb has joined #openstack-infra		00:40
*** betherly has joined #openstack-infra		00:52
*** diablo_rojo has quit IRC		00:55
*** betherly has quit IRC		00:56
*** ansmith_ has joined #openstack-infra		01:02
*** jamesmcarthur has joined #openstack-infra		01:03
*** jamesmcarthur has quit IRC		01:05
*** jamesmcarthur has joined #openstack-infra		01:05
*** betherly has joined #openstack-infra		01:12
*** betherly has quit IRC		01:17
*** mrsoul has joined #openstack-infra		01:17
*** betherly has joined #openstack-infra		01:33
*** jamesmcarthur has quit IRC		01:35
*** betherly has quit IRC		01:37
*** carl_cai has quit IRC		01:48
*** lbragstad has quit IRC		01:49
*** lbragstad has joined #openstack-infra		01:49
*** betherly has joined #openstack-infra		01:53
*** rcernin has joined #openstack-infra		01:55
*** betherly has quit IRC		01:58
*** jamesmcarthur has joined #openstack-infra		02:05
*** liusheng__ has joined #openstack-infra		02:07
*** jamesmcarthur has quit IRC		02:09
*** betherly has joined #openstack-infra		02:14
*** bobh has joined #openstack-infra		02:14
*** betherly has quit IRC		02:19
openstackgerrit	Merged openstack-infra/project-config master: Disable inap-mtl01 provider https://review.openstack.org/613418	02:22
dmsimard	Would it be a good idea to force http -> https redirection on our things that are available over ssl ?	02:27
dmsimard	logs, git, zuul, etc	02:27
dmsimard	I could write a patch like that	02:27
*** adrianreza has joined #openstack-infra		02:31
*** betherly has joined #openstack-infra		02:34
*** betherly has quit IRC		02:39
*** bhavikdbavishi has joined #openstack-infra		02:47
dmsimard	What's the thing that closes PRs on github with a template ?	02:47
*** rh-jelabarre has quit IRC		02:48
*** roman_g has quit IRC		02:49
*** betherly has joined #openstack-infra		02:54
*** betherly has quit IRC		03:00
*** bobh has quit IRC		03:10
*** ykarel\|away has joined #openstack-infra		03:25
*** dpawlik has quit IRC		03:27
*** carl_cai has joined #openstack-infra		03:27
*** cfriesen has quit IRC		03:29
*** dpawlik has joined #openstack-infra		03:29
*** betherly has joined #openstack-infra		03:35
*** betherly has quit IRC		03:40
*** udesale has joined #openstack-infra		03:51
*** betherly has joined #openstack-infra		03:56
*** lpetrut has joined #openstack-infra		03:58
*** betherly has quit IRC		04:00
*** dave-mccowan has quit IRC		04:14
*** janki has joined #openstack-infra		04:22
ianw	clarkb: https://review.openstack.org/613503 Call pre/post run task calls from TaskManager.submit_task() I think explains our missing nodepool logs	04:28
*** lpetrut has quit IRC		04:34
*** dpawlik has quit IRC		04:36
*** dpawlik has joined #openstack-infra		04:39
*** kjackal has joined #openstack-infra		04:45
*** ramishra has joined #openstack-infra		05:12
*** yamamoto has quit IRC		05:26
*** yamamoto has joined #openstack-infra		05:26
*** kjackal has quit IRC		05:29
*** carl_cai has quit IRC		05:33
*** betherly has joined #openstack-infra		05:36
*** betherly has quit IRC		05:40
*** bhavikdbavishi1 has joined #openstack-infra		05:47
*** trown has joined #openstack-infra		05:49
*** kopecmartin has joined #openstack-infra		05:50
*** elod_ has joined #openstack-infra		05:50
*** evrardjp_ has joined #openstack-infra		05:51
*** quiquell\|off is now known as quiquell		05:53
*** jpenag has joined #openstack-infra		05:53
*** hemna_ has joined #openstack-infra		05:54
*** ianw_ has joined #openstack-infra		05:54
*** dims_ has joined #openstack-infra		05:54
*** bhavikdbavishi has quit IRC		05:55
*** apetrich has quit IRC		05:55
*** dhill_ has quit IRC		05:55
*** Diabelko has quit IRC		05:55
*** SotK has quit IRC		05:55
*** gothicmindfood has quit IRC		05:55
*** kopecmartin\|off has quit IRC		05:55
*** dims has quit IRC		05:55
*** dulek has quit IRC		05:55
*** jpena\|off has quit IRC		05:55
*** strigazi has quit IRC		05:55
*** elod has quit IRC		05:55
*** nhicher has quit IRC		05:55
*** lucasagomes has quit IRC		05:55
*** gnuoy has quit IRC		05:55
*** hemna has quit IRC		05:55
*** evrardjp has quit IRC		05:55
*** mudpuppy has quit IRC		05:55
*** mattoliverau has quit IRC		05:55
*** cgoncalves has quit IRC		05:55
*** brwyatt has quit IRC		05:55
*** emerson has quit IRC		05:55
*** bradm has quit IRC		05:55
*** chkumar\|off has quit IRC		05:55
*** ianw has quit IRC		05:55
*** Qiming has quit IRC		05:55
*** jlvillal has quit IRC		05:55
*** aluria has quit IRC		05:55
*** mdrabe has quit IRC		05:55
*** mpjetta has quit IRC		05:55
*** Keitaro has quit IRC		05:55
*** trown\|outtypewww has quit IRC		05:55
*** bhavikdbavishi1 is now known as bhavikdbavishi		05:55
*** ianw_ is now known as ianw		05:55
*** brwyatt has joined #openstack-infra		05:56
*** irclogbot_1 has quit IRC		05:58
*** apetrich has joined #openstack-infra		06:02
*** dhill_ has joined #openstack-infra		06:02
*** Diabelko has joined #openstack-infra		06:03
*** Keitaro has joined #openstack-infra		06:05
*** chandankumar has joined #openstack-infra		06:06
*** ykarel\|away is now known as ykarel		06:10
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: New Repo: OpenStack-Helm Docs https://review.openstack.org/611893	06:20
*** ccamacho has quit IRC		06:20
*** xinliang has joined #openstack-infra		06:21
AJaeger	config-core, two new repos for review, please https://review.openstack.org/#/c/611892 and https://review.openstack.org/611893	06:22
AJaeger	dmsimard: openstack-infra/jeepyb/jeepyb/cmd/close_pull_requests.py - let me fix quickly...	06:23
*** gfidente has joined #openstack-infra		06:26
openstackgerrit	Andreas Jaeger proposed openstack-infra/jeepyb master: Use https for links https://review.openstack.org/613509	06:28
AJaeger	dmsimard: ^	06:28
*** aojeagarcia has joined #openstack-infra		06:29
*** aojea has quit IRC		06:33
*** ccamacho has joined #openstack-infra		06:41
*** bhavikdbavishi has quit IRC		06:48
*** ccamacho has quit IRC		06:49
*** ccamacho has joined #openstack-infra		06:51
*** yamamoto has quit IRC		06:53
*** yamamoto has joined #openstack-infra		06:53
*** yamamoto has quit IRC		06:53
*** yamamoto has joined #openstack-infra		06:54
openstackgerrit	OpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/613511	06:55
*** quiquell is now known as quiquell\|brb		06:57
*** ginopc has joined #openstack-infra		07:07
*** quiquell\|brb is now known as quiquell		07:14
*** rcernin has quit IRC		07:22
*** ykarel is now known as ykarel\|lunch		07:24
*** shardy has joined #openstack-infra		07:26
*** strigazi has joined #openstack-infra		07:26
*** bauzas is now known as bauwser		07:35
*** witek has quit IRC		07:35
*** xek has joined #openstack-infra		07:35
*** evrardjp_ is now known as evrardjp		07:37
*** tosky has joined #openstack-infra		07:46
*** hashar has joined #openstack-infra		07:53
*** kjackal has joined #openstack-infra		07:57
*** rossella_s has joined #openstack-infra		08:00
*** jpich has joined #openstack-infra		08:03
*** SotK has joined #openstack-infra		08:06
*** elod_ is now known as elod		08:07
*** carl_cai has joined #openstack-infra		08:22
*** derekh has joined #openstack-infra		08:23
openstackgerrit	Merged openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/613511	08:29
*** ccamacho has quit IRC		08:32
*** panda\|off is now known as panda		08:32
*** lucasagomes has joined #openstack-infra		08:33
*** ccamacho has joined #openstack-infra		08:33
openstackgerrit	Frank Kloeker proposed openstack-infra/openstack-zuul-jobs master: Rename index file of doc translations https://review.openstack.org/613531	08:35
ianw	hrm, there seems to be something up with http://mirror.regionone.limestone.openstack.org/	08:40
ianw	#status log restarted apache2 service on mirror.regionone.limestone.openstack.org	08:41
openstackstatus	ianw: finished logging	08:41
ianw	nothing really odd in the logs	08:41
*** ykarel\|lunch is now known as ykarel		08:42
*** dulek has joined #openstack-infra		08:46
openstackgerrit	Merged openstack-infra/irc-meetings master: Remove ironic-bfv and ironic-ui meetings https://review.openstack.org/612695	09:05
*** xinliang has quit IRC		09:09
*** e0ne has joined #openstack-infra		09:16
*** kjackal_v2 has joined #openstack-infra		09:16
*** kjackal has quit IRC		09:20
*** xinliang has joined #openstack-infra		09:21
*** kjackal_v2 has quit IRC		09:28
*** kjackal has joined #openstack-infra		09:28
*** Qiming has joined #openstack-infra		09:35
*** yamamoto has quit IRC		09:36
*** alexchadin has joined #openstack-infra		09:37
*** electrofelix has joined #openstack-infra		09:58
*** dpawlik has quit IRC		10:03
*** dpawlik_ has joined #openstack-infra		10:03
*** lpetrut has joined #openstack-infra		10:11
*** jamesmcarthur has joined #openstack-infra		10:12
*** ssbarnea has joined #openstack-infra		10:12
*** jamesmcarthur has quit IRC		10:16
*** bhavikdbavishi has joined #openstack-infra		10:28
*** bhavikdbavishi has quit IRC		10:32
mtreinish	fungi: we should be already running the fix for 7651, we switched to the ppa to get 1.15 (which includes the fix for 7651)	10:32
mtreinish	fungi: also persia and I backported that fix for ubuntu at the dublin ptg: https://bugs.launchpad.net/ubuntu/+source/mosquitto/+bug/1752591	10:32
openstack	Launchpad bug 1752591 in mosquitto (Ubuntu Bionic) "CVE-2017-7651 and CVE-2017-7652" [Undecided,Fix released]	10:32
mtreinish	so unfortunately I don't think it will fix our crashing issue, that's a bug with the log handling	10:33
*** bhavikdbavishi has joined #openstack-infra		10:34
*** jpenag is now known as jpena		10:36
*** bhavikdbavishi has quit IRC		10:38
mtreinish	fungi: it probably doesn't hurt to bump up the version, but I'm not optimistic that it would fix the crashing	10:40
*** betherly has joined #openstack-infra		10:41
*** kjackal has quit IRC		10:46
*** pbourke has quit IRC		10:48
*** pbourke has joined #openstack-infra		10:48
*** dtantsur\|afk is now known as dtantsur		10:48
*** ssbarnea has quit IRC		10:49
*** e0ne has quit IRC		10:51
*** e0ne_ has joined #openstack-infra		10:52
*** alexchadin has quit IRC		10:52
*** AJaeger_ has joined #openstack-infra		10:57
*** AJaeger has quit IRC		10:59
*** jpena is now known as jpena\|lunch		11:01
mtreinish	fungi: we set it to 'present' in the puppet. So bumping the package will have to be done manually: https://git.openstack.org/cgit/openstack-infra/puppet-mosquitto/tree/manifests/init.pp#n16	11:06
slaweq	hi infra team	11:06
slaweq	I just spotted error like: http://logs.openstack.org/14/613314/1/check/neutron-grenade-multinode/6874aba/job-output.txt.gz#_2018-10-26_09_08_36_328644 (/tmp/ansible/bin/ara: No such file or directory) in two different jobs running on Neutron rocky branch, do You know what could cause that?	11:07
*** kjackal has joined #openstack-infra		11:10
*** dave-mccowan has joined #openstack-infra		11:15
*** EmilienM is now known as EvilienM		11:24
*** udesale has quit IRC		11:28
*** panda is now known as panda\|lunch		11:29
*** hashar is now known as hasharAway		11:31
*** ramishra has quit IRC		11:31
*** janki has quit IRC		11:36
*** ansmith_ has quit IRC		11:39
*** rh-jelabarre has joined #openstack-infra		11:43
*** longkb has quit IRC		11:49
*** jpena\|lunch is now known as jpena		11:57
*** ykarel is now known as ykarel\|away		11:58
*** yamamoto has joined #openstack-infra		12:02
*** carl_cai has quit IRC		12:02
*** ykarel\|away has quit IRC		12:02
*** jcoufal has joined #openstack-infra		12:04
*** kjackal has quit IRC		12:08
*** kjackal has joined #openstack-infra		12:09
*** emerson has joined #openstack-infra		12:15
dmsimard	slaweq: that comes from devstack-gate: http://codesearch.openstack.org/?q=%2Ftmp%2Fansible%2Fbin%2Fara&i=nope&files=&repos=	12:16
dmsimard	The ara not found is intriguing. I need to drop kids at school, I'll be able to check in ~20 minutes	12:18
slaweq	dmsimard: thx a lot	12:23
*** panda\|lunch is now known as panda		12:24
*** eharney has joined #openstack-infra		12:25
*** bobh has joined #openstack-infra		12:26
*** e0ne_ has quit IRC		12:26
*** carl_cai has joined #openstack-infra		12:29
*** yamamoto has quit IRC		12:32
*** rlandy has joined #openstack-infra		12:36
*** quiquell is now known as quiquell\|lunch		12:37
fungi	dmsimard: that should be pretty easy to do. we already have some sites/services we do that for (e.g. review, docs, governance, security) so i'd argue there's not a lot of reason to serve any of the reset of them via both http+https these days anyway	12:40
*** agopi\|brb is now known as agopi		12:41
fungi	looks like the releases site redirects http->https as well	12:41
fungi	should be able to just copy configuration from one or more of those, and apply it to anything in our ssl cert check config which is missing that	12:42
*** roman_g has joined #openstack-infra		12:43
openstackgerrit	Simon Westphahl proposed openstack-infra/zuul master: Use branch for grouping in supercedent manager https://review.openstack.org/613335	12:44
dmsimard	slaweq: if you look a bit above that ara command not found error, you'll see that we failed to install ansible in the first place.. looks like timeout to the limestone mirror http://logs.openstack.org/14/613314/1/check/neutron-grenade-multinode/6874aba/job-output.txt.gz#_2018-10-26_08_42_51_581644	12:44
slaweq	dmsimard: thx for investigating that, so it looks that it was probably temporary issue on one cloud provider only	12:45
dmsimard	slaweq: the server looks healthy and reachable right now, there may have been a temporary network issue	12:46
dmsimard	please recheck and let us know if it reoccurs	12:46
dmsimard	fungi: ok, I'll take a stab at it	12:47
*** jamesmcarthur has joined #openstack-infra		12:47
slaweq	dmsimard: sure, thx a lot	12:47
fungi	slaweq: dmsimard: earlier (08:40z in scrollback) ianw noted that apache had died on that mirror and he restarted it. also logged at https://wiki.openstack.org/wiki/Infrastructure_Status	12:48
dmsimard	ah, well there we go	12:49
dmsimard	I'm not fully awake yet haha	12:49
fungi	np, i'm already well on my way to caffeination	12:50
*** mdrabe has joined #openstack-infra		12:53
*** yamamoto has joined #openstack-infra		12:55
quiquell\|lunch	fungi: Do you know why I have "This change depends on a change that failed to merge" here https://review.openstack.org/#/c/613297/	12:56
quiquell\|lunch	fungi: all of them has being rebased	12:56
fungi	quiquell\|lunch: the timing of the message is usually an indicator	12:57
quiquell\|lunch	fungi: ahh wait... I didn't rebase on of the... git pull --rebase does not do the job	12:58
slaweq	fungi: thx also for help	12:58
fungi	quiquell\|lunch: you uploaded patchset #5 at 10:15z, so it was queued for testing or possibly in the midst of running some jobs, then at 11:11z one of its dependencies got uploaded	12:58
logan-	regarding the limestone mirror apache issue, the disk is 90% full because of the base image churn from yesterday. there are 2 sets of base images cached on all of the nodes currently until nova deletes the old nodepool images today.	12:59
quiquell\|lunch	fungi: ack thanks !	12:59
fungi	quiquell\|lunch: and so it was queued to test with dependent change 613316,2 but you uploaded 613316,3 so zuul was informing you that the original dependency can never merge now and it has aborted the queued/running jobs	12:59
*** ansmith_ has joined #openstack-infra		12:59
logan-	i will remove that hv from the aggregate for now so no nodepool images will get scheduled there, that will keep the usage steady until the cleanup occurs	12:59
fungi	quiquell\|lunch: a recheck of will 613297 will queue it to test with the new dependency you uploaded	13:00
*** kgiusti has joined #openstack-infra		13:00
*** dave-mccowan has quit IRC		13:00
fungi	logan-: thanks! one thing worth noting, to work around the full disk issues crashing the mirror vm completely we "preallocated" the remaining rootfs by writing zeroes to a file and then deleting it once we hit enospc	13:01
*** derekh has quit IRC		13:01
logan-	yeah, I suspect the disk hit 100% at some point this morning (90% right now with 12 nodepool vms running), and the preallocation probably prevented it from crashing ;)	13:03
*** quiquell\|lunch is now known as quiquell		13:03
logan-	218G of cached images weighing heavy on it heh	13:04
*** derekh has joined #openstack-infra		13:04
*** derekh has quit IRC		13:04
fungi	oof!	13:04
fungi	how old are some of those? are we leaking images? that sounds like rather more than i would expect	13:05
*** derekh has joined #openstack-infra		13:05
logan-	i think everything was rebuilt simultaneously yesterday during the zuul/nodepool maintenance so we ended up with 2x the number of images cached than normal	13:05
fungi	we should only ever at most have 3x the number of image labels we've defined (current, previous as a safety fallback, and one uploading before the oldest gets deleted)	13:06
logan-	because iirc nova keeps the base images cached on the hv for 24h after their last use	13:06
fungi	ohhh	13:06
fungi	so on the compute nodes, not in glance	13:06
logan-	since that maintenance is coming up on 24h i think this should just work itself out over the next few hours and then I can put the host back in the aggregate :)	13:06
logan-	yup	13:07
openstackgerrit	Sorin Sbarnea proposed openstack-dev/pbr master: Correct documentation hyperlink for environment-markers https://review.openstack.org/613576	13:07
fungi	also i think in glance we'll generally run much closer to 2x than 3x because we only upload one image at a time and then delete the oldest for that label	13:07
*** tpsilva has joined #openstack-infra		13:09
logan-	yup, glance is on a 30TB ceph pool so no concerns there	13:10
logan-	images leak often but I think clarkb cleaned up all of the old leaked images yesterday	13:10
*** AJaeger_ is now known as AJaeger		13:18
*** dpawlik_ has quit IRC		13:18
*** mriedem has joined #openstack-infra		13:19
*** dpawlik has joined #openstack-infra		13:20
*** efried is now known as fried_rice		13:23
*** e0ne has joined #openstack-infra		13:25
*** chandankumar is now known as chkumar\|off		13:33
fungi	yes, i believe he did shortly after the upgrade	13:35
fungi	er, the zk cluster replacement for nodepool i mean	13:35
*** agopi is now known as agopi\|brb		13:35
*** agopi\|brb has quit IRC		13:40
*** jamesmcarthur has quit IRC		13:46
ssbarnea\|bkp2	fungi: regarding moving browbeat config to repo at https://review.openstack.org/#/c/613092 -- already merged in repo, do we need to keep the pubish-to-pypi inside project-config or we can remove the entire section?	13:48
ssbarnea\|bkp2	it is already listed inside repo.	13:48
*** boden has joined #openstack-infra		13:50
fungi	ssbarnea\|bkp2: it looks like other official projects have kept the publish-to-pypi or publish-to-pypi-python3 template application in project-config but i'll admit i haven't been following the goal work there closely enough to know for sure whether that's intended (i have to assume it must be?). AJaeger: do you know the reason for that?	13:51
*** bnemec has joined #openstack-infra		13:54
*** munimeha1 has joined #openstack-infra		13:58
*** agopi\|brb has joined #openstack-infra		14:05
*** agopi\|brb is now known as agopi		14:05
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: DNM Link to change page from status panel https://review.openstack.org/613593	14:10
*** onovy has quit IRC		14:16
AJaeger	fungi: we left them in project-config since tagging does not know about pipelines, so a branched project needs to have the job declared in project-config. This is mentioned in infra-manual as well	14:17
AJaeger	ssbarnea\|bkp2: https://review.openstack.org/#/c/613004/6/.zuul.yaml did not import publish-to-pypi, it's not in-repo	14:17
AJaeger	ssbarnea\|bkp2, fungi, so https://review.openstack.org/#/c/613092 is fine to +2A IMHO.	14:18
ssbarnea\|bkp2	AJaeger: no worry. i can add it. i just wanted to know if there is something preventing a full move.	14:18
*** gfidente has quit IRC		14:18
AJaeger	ssbarnea\|bkp2: https://docs.openstack.org/infra/manual/creators.html#central-config-exceptions	14:19
*** dpawlik has quit IRC		14:19
fungi	AJaeger: ahh, right, we still haven't decided on the possible https://review.openstack.org/578557 behavior change for that	14:21
*** jamesmcarthur has joined #openstack-infra		14:21
ssbarnea\|bkp2	AJaeger: so i was remembering something from that doc. Still "should" in specs is such a gray area... :)	14:21
*** stephenfin is now known as finucannot		14:22
*** dpawlik has joined #openstack-infra		14:24
openstackgerrit	Monty Taylor proposed openstack-infra/zuul master: quick-start: add a note about github https://review.openstack.org/613398	14:25
*** tosky has quit IRC		14:25
*** roman_g has quit IRC		14:27
boden	hi.. I've been trying to update tricircle's in repo zuul config and tox for zuul v3 (they appear to be out of date) in https://review.openstack.org/#/c/612729/ previously they were installing required projects in tox, but when I remove that and add them to the .zuul.conf there's import errors http://logs.openstack.org/29/612729/3/check/openstack-tox-py27/a00e36e/job-output.txt.gz#_2018-10-25_20_17_28_206916 I see	14:28
boden	neutron installed as a sibling so I'm confused as to root cause of import err	14:28
boden	any ideas?	14:28
*** roman_g has joined #openstack-infra		14:28
*** tosky has joined #openstack-infra		14:29
*** dpawlik has quit IRC		14:29
*** kjackal has quit IRC		14:32
*** kjackal has joined #openstack-infra		14:32
ssbarnea\|bkp2	AJaeger fungi : small css improvement on os-loganalyze (no more horizontal browsing on pip reqs listings): https://review.openstack.org/#/c/613383/	14:35
*** smarcet has joined #openstack-infra		14:37
boden	actually maybe its because those dependencies are not int he requirements... I'll try that	14:38
*** quiquell is now known as quiquell\|off		14:42
*** carl_cai has quit IRC		14:42
fungi	boden: yeah, http://logs.openstack.org/29/612729/3/check/openstack-tox-py27/a00e36e/tox/py27-siblings.txt indicates to me that it didn't get installed (probably owing to it not being in the requirements as you noted)	14:45
boden	fungi thanks... what makes you say it wasn't installed from that log.. I see "Sibling neutron at src/git.openstack.org/openstack/neutron" doesn't that mean it's already there?? just trying to understand for my own benefit	14:46
fungi	mordred: is that ^ correct? would there be both a "sibling at path" line and a "found neutron python package installed" line in that log if it had been?	14:46
mordred	fungi: reading	14:47
fungi	boden: i interpreted that to mean that it sees src/git.openstack.org/openstack/neutron and is aware it's listed as a required-project but not necessarily that tox installed it into the resulting virtualenv	14:47
boden	hmm ok.. thanks	14:47
mordred	yes - a found sibling must already be in the requirements for it to be installed	14:47
mordred	so if there are repos in required-projects but not listed in requirements.txt they will not be installed	14:48
fungi	the pip freeze is here too which i think confirms it: http://logs.openstack.org/29/612729/3/check/openstack-tox-py27/a00e36e/tox/py27-5.log	14:48
fungi	no neutron in the freeze output	14:48
mordred	(this is to avoid things like pip install -e src/git.openstack.org/openstack/requirements - which is not what we'd want to have happen	14:48
fungi	looks like the only package installed from local source in that freeze is tricircle==5.1.1.dev35	14:49
fungi	boden: ^	14:50
fungi	ssbarnea\|bkp2: thanks! that looks pretty straightforward	14:50
boden	fungi: ack, got it...	14:50
*** armstrong has joined #openstack-infra		14:51
*** ssbarnea\|bkp2 has quit IRC		14:51
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add the process environment to zuul.conf parser https://review.openstack.org/612824	14:51
*** rossella_s has quit IRC		14:53
corvus	infra-root: i'm going to be afk until wednesday	14:56
*** cfriesen has joined #openstack-infra		14:56
fungi	thanks for the heads-up! i hope it's for fun reasons	14:57
*** rpioso\|afk is now known as rpioso		14:58
*** ssbarnea has joined #openstack-infra		15:00
*** diablo_rojo has joined #openstack-infra		15:00
*** dansmith is now known as SteelyDan		15:01
*** dave-mccowan has joined #openstack-infra		15:04
*** dave-mccowan has quit IRC		15:10
*** hasharAway is now known as hashar		15:12
openstackgerrit	Sorin Sbarnea proposed openstack-dev/pbr master: Correct documentation hyperlink for environment-markers https://review.openstack.org/613576	15:16
*** gyee has joined #openstack-infra		15:23
*** onovy has joined #openstack-infra		15:31
*** smarcet has quit IRC		15:40
*** apetrich has quit IRC		15:44
*** zul has quit IRC		15:52
clarkb	morning, having a slow start to the day and I need to run some erradns so may be a bit before I'm actually around. I'd like to look at using the new compute resource usage logs to produce a report of some sort that shows usage by projects (and maybe by distro-release and other stuff if we can do it)	15:52
*** agopi is now known as agopi\|food		15:52
clarkb	it looks like zk is still happy and the node count has leveled off	15:53
*** gothicmindfood has joined #openstack-infra		15:54
*** apetrich has joined #openstack-infra		15:58
*** lpetrut has quit IRC		16:00
*** ginopc has quit IRC		16:02
*** dtantsur is now known as dtantsur\|afk		16:07
*** kjackal has quit IRC		16:10
*** e0ne has quit IRC		16:10
*** kopecmartin is now known as kopecmartin\|off		16:13
*** bnemec is now known as beekneemech		16:13
dmsimard	does "flake8: noqa" no longer work ? I'm seeing pep8 failures that should be ignored	16:18
openstackgerrit	Alex Schultz proposed openstack-infra/project-config master: Add noop to instack-undercloud https://review.openstack.org/613630	16:19
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648	16:20
*** shardy has quit IRC		16:20
*** fried_rice is now known as fried_rolls		16:20
dmsimard	looks like it's "noqa" instead of "flake8: noqa" now	16:21
dmsimard	¯\_(ツ)_/¯	16:21
clarkb	dmsimard: it has always been just # noqa iirc	16:23
fungi	i don't recall ever using "flake8: noqa" and only ever used "noqa" myself	16:23
dmsimard	http://codesearch.openstack.org/?q=flake8%3A%20noqa&i=nope&files=&repos=	16:24
fungi	interesting. i guess that must have worked at some point or else it's a really huge case of cargo-culting	16:26
ssbarnea	some with me, only used # noqa --- .... when it was not really possible to avoid it.	16:26
mordred	it definitely _used_ to work	16:26
*** hashar is now known as hasharAway		16:28
fungi	skimming through http://flake8.pycqa.org/en/latest/release-notes/index.html it doesn't look like they ever deprecated it and i even see a reference to it in the latest 3.6.0 notes	16:28
fungi	http://flake8.pycqa.org/en/latest/release-notes/3.6.0.html#features	16:28
fungi	dmsimard: so it should still work?	16:29
fungi	dmsimard: care to link to the failure in question?	16:29
*** aojeagarcia has quit IRC		16:30
dmsimard	sure, hang on	16:30
dmsimard	example: http://logs.openstack.org/99/613399/1/check/openstack-tox-pep8/6b14dee/job-output.txt.gz#_2018-10-25_23_54_59_038898 fixed by https://review.openstack.org/#/c/613634/	16:31
*** jpich has quit IRC		16:31
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648	16:31
*** mriedem is now known as mriedem_away		16:34
fungi	dmsimard: is "E261 at least two spaces before inline comment" perhaps the actual problem you ended up solving (inadvertently) there?	16:35
dmsimard	yeah that's the first thing I tried	16:36
fungi	you replaced "foo # flake8: noqa" lines with "foo # noqa" (note the leading double-space)	16:36
dmsimard	these failures sort of confused me to be honest because this code hasn't been touched in a very long time	16:36
dmsimard	and it started failing just now	16:36
fungi	did you suddenly switch to a newer flake8?	16:37
dmsimard	not sure, to be fair there isn't exactly a lot of traffic on ara since everything is focused on the new 1.0 repos so it may be something that failed now but the cause dates back days/weeks	16:38
fungi	flake8==3.6.0	16:38
fungi	that's the newest release from 3 days ago	16:38
*** agopi\|food is now known as agopi		16:38
fungi	and note the comment about noqa in the release notes i linked for 3.6.0	16:38
fungi	"Only skip a file if # flake8: noqa is on a line by itself (See also GitLab#453, GitLab!219)"	16:39
fungi	so i take that to mean that prior to 3.6.0 it was skipping that whole file because at least one line had "# flake8: noqa"	16:39
dmsimard	I think it was only meant to skip a particular line but don't quote me on that	16:39
dmsimard	at least, that's my understanding of it	16:40
fungi	yeah, see those gitlab links	16:40
dmsimard	and from codesearch, it seems to be how projects are using it too	16:40
fungi	which would explain why all those unrelated linting errors for that file suddenly popped up when switching to 3.6.0	16:40
dmsimard	oh! so it ignored the whole file instead of just the one line	16:40
fungi	https://gitlab.com/pycqa/flake8/issues/453	16:40
dmsimard	which is in all likelihood not the original intent	16:40
fungi	yeah, i think people were misusing it	16:40
fungi	so les cultes du cargo at work	16:41
dmsimard	well, if pep8 jobs start failing all over the place, we'll know why :D	16:41
* fungi butchers french for your pleasure		16:41
dmsimard	maybe openstack-dev worthy	16:41
fungi	yes, i think this will be of interest to openstack-dev ml	16:41
dmsimard	I'll send something	16:41
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648	16:41
fungi	odds are few people have run into it yet because we generally pin linters for official openstack projects at the start of a cycle	16:42
*** fuentess has joined #openstack-infra		16:42
fungi	so this will require a fair amount of cleanup from a lot of projects who were up to now doing the wrong thing and not realizing it	16:43
*** trown is now known as trown\|lunch		16:43
dmsimard	++	16:43
*** sthussey has joined #openstack-infra		16:44
dmsimard	fungi: I don't see a pin on flake8 in openstack/requirements.. would that be elsewhere ?	16:49
fungi	dmsimard: it's in each project. we omit linters from requirements tracking explicitly	16:50
dmsimard	ah	16:50
fungi	because different projects will want to raise their linter caps at their own pace	16:51
dmsimard	makes sense	16:53
*** derekh has quit IRC		16:58
*** bauwser is now known as bauzas		17:01
*** betherly has quit IRC		17:03
*** zul has joined #openstack-infra		17:04
*** jpena is now known as jpena\|off		17:06
*** lpetrut has joined #openstack-infra		17:08
*** jamesmcarthur has quit IRC		17:16
*** electrofelix has quit IRC		17:32
*** ykarel\|away has joined #openstack-infra		17:34
ssbarnea	fungi: no meeting in progress, good time to merge https://review.openstack.org/#/c/613022/ ?	17:35
ssbarnea	if i remember well openstack approach regarding linting was to pin to hacking which was pinning flake8, right?	17:36
*** bobh has quit IRC		17:36
fungi	yes on both questions	17:37
*** lbragstad is now known as elbragstad		17:37
fungi	613022 could use a second infra-root reviewer though since i'm the only +2 on it	17:37
*** Swami has joined #openstack-infra		17:38
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Cleanup down ports https://review.openstack.org/609829	17:41
Shrews	ianw: addressed your comments in ^^^	17:42
*** smarcet has joined #openstack-infra		17:49
*** trown\|lunch is now known as trown		17:50
*** xek has quit IRC		17:58
*** jamesmcarthur has joined #openstack-infra		18:01
AJaeger	config-core, please review https://review.openstack.org/613092 https://review.openstack.org/#/c/611893/ https://review.openstack.org/#/c/611892	18:10
*** armstrong has quit IRC		18:10
*** munimeha1 has quit IRC		18:11
*** mriedem_away is now known as mriedem		18:28
*** apetrich has quit IRC		18:34
*** apetrich has joined #openstack-infra		18:35
*** e0ne has joined #openstack-infra		18:35
openstackgerrit	Felipe Monteiro proposed openstack-infra/project-config master: Remove airship-armada jobs, as they are all in project https://review.openstack.org/611013	18:35
openstackgerrit	Merged openstack-infra/project-config master: Move openstack-browbeat zuul jobs to project repository https://review.openstack.org/613092	18:35
openstackgerrit	Merged openstack-infra/project-config master: New Repo - OpenStack-Helm Images https://review.openstack.org/611892	18:40
openstackgerrit	Sean McGinnis proposed openstack-dev/pbr master: Fix incorrect use of flake8:noqa https://review.openstack.org/613665	18:43
openstackgerrit	Merged openstack-infra/project-config master: New Repo: OpenStack-Helm Docs https://review.openstack.org/611893	18:47
*** jamesmcarthur has quit IRC		18:48
clarkb	fungi: any idea if there are any meetings we need to worry about for 613022?	18:53
fungi	didn't sound like it, but i haven't checked	18:53
fungi	ssbarnea seemed to think it was safe earlier when he brought it up	18:54
clarkb	eavesdrop seems to think it is ok	18:54
ssbarnea	what wrong can happen?	18:55
clarkb	ssbarnea: the meetbot is restarted when we change its config so if a meeting is running when that happens it will break the logging of that meeting	18:55
ssbarnea	ahh, yeah. that is why i suspect weekends are the best times for that. i doubt we have official meetings during them.	18:56
clarkb	the latest meeting on eavesdrop is 1500UTC	18:57
clarkb	so I approved it (1900UTC now)	18:57
*** armax has quit IRC		19:05
openstackgerrit	Alex Schultz proposed openstack-infra/project-config master: Add noop to instack-undercloud https://review.openstack.org/613630	19:06
*** fried_rolls is now known as efried		19:07
*** efried is now known as fried_rice		19:11
*** e0ne has quit IRC		19:14
*** jcoufal has quit IRC		19:15
*** dave-mccowan has joined #openstack-infra		19:18
*** toabctl has quit IRC		19:19
*** e0ne has joined #openstack-infra		19:19
*** smarcet has quit IRC		19:19
*** hasharAway is now known as hashar		19:20
*** toabctl has joined #openstack-infra		19:21
fungi	as long as it doesn't take several days to merge, we should be fine ;)	19:22
*** e0ne has quit IRC		19:23
*** anticw has joined #openstack-infra		19:26
anticw	zuul/pipeline q ... is it possible to have a 3rd party gate that can test some but not all PS? and then have it +Verified at which point zuul will no longer spend effort testing a PS?	19:27
*** harlowja has quit IRC		19:27
clarkb	anticw: third party testing can filter patchsets however they like. Not sure what you mean by the second bit. You want zuul to not test a patchset if it gets +1 from third party ci? if so that isn't possible because you need a +1 and +2 from zuul to merge code	19:31
Shrews	clarkb: oh, forgot to answer your question from yesterday. no, nodepool should not reuse image names. it does, however, retry upload attempts that fail. perhaps a failure actually succeeded?	19:33
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Small script to scrape Zuul job cpu usage https://review.openstack.org/613674	19:35
clarkb	Shrews: interesting, that could be	19:35
openstackgerrit	Merged openstack-infra/system-config master: Adding openstack-browbeat https://review.openstack.org/613022	19:36
*** lpetrut has quit IRC		19:37
openstackgerrit	Merged openstack-infra/zuul master: quick-start: add a note about github https://review.openstack.org/613398	19:42
openstackgerrit	Jeremy Stanley proposed openstack-infra/zuul master: Add reenqueue utility https://review.openstack.org/613676	19:44
*** jamesmcarthur has joined #openstack-infra		19:46
clarkb	mentioned this over in the tc channel but got some simple scripting going to determine nodepool node usage rates by project	19:48
clarkb	http://paste.openstack.org/show/733154/ produced by https://review.openstack.org/613674	19:48
clarkb	the breakdown is tripleo: ~50% of all cpu time, neutron: ~14% and nova ~5%	19:48
clarkb	for the 13 hour period I scraped the logs for	19:48
Shrews	50%? wow. i wonder what percentage of our nodes that ends up being	19:51
clarkb	Shrews: 50%	19:51
Shrews	oh, i misunderstood what you meant by cpu time	19:51
clarkb	ya sorry. the calculation is job runtime * number of nodes used	19:52
clarkb	and that is 50% of what we used not 50% of theoretical max (though I think we were behind the entire 13 hour period so should be the same)	19:52
*** kjackal has joined #openstack-infra		19:55
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648	20:03
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Rate limit updateNodeStats https://review.openstack.org/613680	20:03
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Rate limit updateNodeStats https://review.openstack.org/613680	20:09
fungi	clarkb: thanks, sounds like it roughly matches up with what we expected	20:10
clarkb	fungi: ya no surprises for me other than nova and neutron are lower than they were a year ish a go when I ran the numbers	20:11
clarkb	but they still seem to be right near the top	20:11
clarkb	fungi: the other piece of info I find interesting is kolla and osa vs tripleo	20:11
clarkb	(are they not testing enough or is tripleo just incredibly inefficient, maybe both)	20:11
clarkb	also 374 days of testing in 13 hours	20:12
*** ansmith_ has quit IRC		20:12
clarkb	notmyname also pointed out that activity is likely to factor in. Particularly so since I only had a small window of data	20:12
clarkb	I think if we can look at a months worth over say the month of november we'll have a much better overall picture	20:13
fungi	yeah, this is really early to be drawing detailed conclusions	20:15
fungi	also what will be more interesting is not the snapshot but the trends over time	20:15
fungi	is that 50% decreasing? and how quickly?that will be interesting to	20:15
fungi	find out	20:15
clarkb	yup	20:16
clarkb	and whether or not it syncs up with the release cycle in interesting ways	20:16
clarkb	or is random etc	20:16
fungi	i mean, they're already aware it's a concern and are working to improve the situation. now we can tell them how effective their attempts are to that end	20:16
fungi	which is far more interesting to me than blamethrowing and witch hunts	20:16
anticw	clarkb: there is no way to have an external bot +2 in which case zuul doesn't need to test?	20:19
clarkb	anticw: not in the current system. Zuul is our gatekeeper and it doesn't know how to share those duties with another system	20:20
clarkb	(I think that is intentional fwiw not a bug)	20:20
clarkb	the reason for that is zuul has to ensure that the changes going through it don't break zuul itself	20:20
anticw	clarkb: would it be perverse then to have the zuul job check an external gate for status and short-circuit to OK?	20:23
clarkb	anticw: it might be better to try and undersatnd what is you are trying to do more concretely? What does the third party job do? Does it have to be third party? etc	20:24
anticw	openstack helm jobs are involved and take a long time, i'm asking people if we can do some of this work and avoid hitting the gates so hard	20:24
anticw	there are also sometimes quite long delays before a job will run (3-4 hours isn't uncommon)	20:24
*** zul has quit IRC		20:25
anticw	concretely, i'm looking at zuul right now for 613611,1 for example, we're 5 hours 14 minutes into it	20:26
clarkb	anticw: that is relevant to the discusion fungi and I were just having above. In the last 13 hours openstack-helm was .8% of our resource consumption	20:26
clarkb	anticw: put another way helm isn't hitting the gates so hard	20:26
clarkb	(so we shouldn't expect helm moving third party to change the backlog situation dramatically)	20:27
*** kjackal has quit IRC		20:27
anticw	ok good to know ... so are we doing things poorly that are causing delays? these delays aren't new	20:27
anticw	also, we get a lot of post-failures	20:27
anticw	(it's better in the last week after some refactoring but still not very fast)	20:27
clarkb	anticw: no the delays are due to total demand, we have a fixed number of test resources and people trying to test far more than we can keep with (tripleo is ~50% over the last 13 hours for example)	20:28
*** armax has joined #openstack-infra		20:28
clarkb	anticw: the ways to improve the backlog are either to reduce demand (fix bugs in software to reduce gate resets and number of failures that are "invalid") and to increase the number of resources we have	20:28
*** jamesmcarthur has quit IRC		20:29
clarkb	anticw: are you having post failures due to timeouts?	20:29
anticw	sometimes, unclear why things are slow	20:29
clarkb	anticw: examples would be good if you have them because post failures can happen for a number of reasons	20:29
anticw	https://review.openstack.org/#/c/613356/	20:30
anticw	just taking a recent job	20:30
anticw	also, things run slower than i would expect ... if i run the jobs on a VM locally ... on very old hardware (very old) and slow rotating disks, the gate jobs for me run in about half the time	20:30
fungi	anticw: the thread starting at http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-October/000575.html is also probably relevant to your concerns	20:30
anticw	sometimes less	20:30
anticw	testing the gate jobs in aws and azure the timing is even better than my local test	20:31
clarkb	anticw: do you know what sort of resources you are cosntrainted by? are you running dstat or similar so that we can see what the hold up is?	20:31
fungi	anticw: are your slow-running jobs reliant on nested virtualization performance, perhaps?	20:31
clarkb	anticw: fwiw kata switch from azure to vexxhost (one of our providers) and the runtime went in half? something like that	20:31
anticw	fungi: no	20:31
clarkb	anticw: likely important to identify what the resource contention is if we want to improve it	20:31
anticw	clarkb: i'm guessing we're badly IO limited in some but not all cases	20:32
anticw	certainly some builder infra (as identified by hostname which might be bogus) seem worse than others	20:32
clarkb	anticw: the last time this came up with OSH I had asked for more logging and data like dstat. Any idea if we have that? Its easy to point and say "this is bad aws better" but I can't make that actionable	20:32
anticw	clarkb: we have 'more logging' but i don't know that it's enough to pin point it just yet	20:33
anticw	srwilkers: ^ ?	20:33
*** ykarel\|away has quit IRC		20:34
anticw	not entirely useful but https://pastebin.com/HGKCEGgJ is a grep from the job running locally	20:35
anticw	that shows the job ran in about 15 minutes ... again ... on a VM on pretty old (2010) hardware	20:35
clarkb	anticw: that particular post failure appears to be due to one of the instances not being reachable at the end. It appears that the job failed properly earlier in the job due to mariadb not starting (possibly beacuse it was supposed to run on the non responsive isntance?)	20:35
anticw	using the aformentioned url i think that took 1h 2 mins on a gate	20:36
clarkb	anticw: ish, but it timed out waiting for a thing to happen that never happend. I don't think that timeout was due to slowness but isntead due to network communication problems	20:36
clarkb	(still not good, but important to identify the issue)	20:36
anticw	clarkb: ok, network issues is something that's been pointed out before	20:36
anticw	i'm not really sure what those would be ... and why some builders wouuld have them	20:37
anticw	again, i tried aws and azure as reference points and for the most part they were rock solid and considerably faster (2x to 4x)	20:37
clarkb	right and I'm asking you to help us idenfity why that is the case so that we can hopefully improve the situation with our zuul	20:38
anticw	yeah, so if it's networking ... what do you suggest to help there?	20:38
fungi	network connectivity between instances in some cloud providers can vary in quality, for sure. that's been one of our biggest challenges for overall reliability of jobs	20:38
anticw	fungi: ok, but ... we're not doing a lot of networking	20:38
clarkb	anticw: I don't know that networking was slow. It appears networking didn't work at all for one of your 5 instances. Those two issues may be orthogonal to each other	20:38
anticw	and networking between VMs in providers isn't a new thing	20:38
fungi	um, yes i'm quite aware	20:39
anticw	clarkb: others have claimed networking issues as well	20:39
fungi	is your problem a new thing?	20:39
anticw	no, not new	20:39
clarkb	fungi: no we tried debugging this a while back	20:39
clarkb	I asked for logging and never got any	20:39
fungi	okay, just figuring out what you mean by "networking between VMs in providers isn't a new thing"	20:39
anticw	just the number of checks required has increased so transient failures bite more now	20:39
clarkb	because unfortunately we werent' logging why the containers were failing to start	20:39
clarkb	just that they had failed	20:39
fungi	anticw: and this is a 5-node job?	20:39
anticw	fungi: i'm saying whilst i accept networking might be an issue ... in this day and age that seems suprising	20:39
anticw	the above example was yes	20:40
fungi	oh, then prepare to be surprised	20:40
fungi	cloud providers love to under^Wright-size their network gear	20:40
fungi	and it gets saturated massively for some periods	20:40
anticw	yeah ... but even so ... networking go super fast and super cheap ... you'd really have to put effort in to make it that poor	20:40
anticw	10GbE+ is basically free at this stage ... i have a box of 10G nics someone gave me even	20:41
fungi	has nothing to do with effort making network gear slow and everything to do with noisy neighbors sharing network resources with you	20:41
clarkb	ok I've confirmed the node that had sshc onnectivity issues in ara is the one that was trying to host the failed mariadb container	20:41
fungi	and if cloud providers are using servers from 2010, imagine that their network gear can easily be of the same vintage	20:41
anticw	clarkb: how did you verify that?	20:41
anticw	fungi: i'm using 2010 hardware, i imagine they are using something less ancient	20:42
fungi	ahh, in some cases they aren't (or not much newer anyway)	20:42
*** smarcet has joined #openstack-infra		20:42
*** openstackstatus has quit IRC		20:42
*** openstack has joined #openstack-infra		20:44
*** ChanServ sets mode: +o openstack		20:44
clarkb	anticw: I can't use that data to say that would cause slowless (I don't know enough about your tests), but I am fairly confident that is why it failed	20:44
anticw	i guess our tests take a long time	20:44
anticw	which makes things worse	20:44
fuentess	clarkb: hi Clark, can you help me disable the kata Fedora job for the proxy repo? We still have some issues with Fedora on vexxhost, so would be good to disable it until we resolve them	20:45
anticw	more likley to have some sort of glitch somewhere the longer we run	20:45
clarkb	fuentess: yup	20:45
anticw	clarkb: i don't really know how to instrument cpu/io on zuul VMs but i could locally ... is that useful?	20:46
clarkb	fuentess: https://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n45 is the section of code to edit. Remove the line for the fedora job	20:46
clarkb	anticw: we've used dstat for a long time with things like devstack + tempest jobs	20:46
fuentess	clarkb: ohh cool, thanks	20:46
anticw	yeah, we could have dstat log in the background	20:47
clarkb	anticw: captures io (network and disk), memory use, cpu usage etc every second iirc	20:47
clarkb	anticw: and there are tools to render the data into more human friendly formats like stackviz	20:47
fungi	i wonder if we even already have a dstat role for starting it early in the job and then collecting its logs	20:47
clarkb	(though I think stackviz dstat rendering is broken right now)	20:47
clarkb	fungi: that type of work has been ongoing in zuul land	20:47
clarkb	fungi: no definitive answer yet but progress I think	20:48
fungi	yeah, i could see that being generally useful across a broad variety of jobs	20:48
anticw	re: multinode i could spit out a DS that does NxN network pings and have that log i guess	20:49
anticw	(ping in a generic sense)	20:49
anticw	people like mallanox who have their own CI ... does that also require zuul for merges?	20:50
fungi	that might at least allow you to also short-circuit your job early of one of your expected nodes in the multinode set becomes unreachable	20:50
anticw	(i forget where i saw this, some typo on a PS# and it popped up once)	20:50
clarkb	anticw: the other thing to keep in mind is that the jobs themselves can crash the networkign on the host too (I have no idea if that is happening here)	20:50
clarkb	either by updating the firewall improperly or applying new config to interfaces that won't work within a provider. We've seen both things happen with jobs in the past	20:51
anticw	clarkb: how does networking on the host crash? that seems like it should be pretty rare	20:51
anticw	that's the sort of thing i would expect on a c-64	20:51
fungi	"crash" is a relative term here ;)	20:51
clarkb	anticw: crash in the sense it stops working not kernel panic crash	20:51
clarkb	we've had jobs use invalid network ranges and apply them to the actual host interfaces	20:52
clarkb	that will breakthings fast	20:52
fungi	or you could do something to inadvertently flush the iptables rules suddenly blocking all traffic on that instance	20:52
clarkb	we've also had jobs apply firewall rules that prevent ssh ya ^	20:52
fungi	or something could simply cause the service you're trying to talk to on that node to die and not restart	20:52
anticw	we used to use a lot of memory	20:52
anticw	that's better now but not ideal	20:52
fungi	yes, oom killer knocking out a crucial service is not unusual	20:53
anticw	our stuff is kind bloaty :(	20:53
anticw	i don't know that we get oom, most just poor IO performance (lack of page cache hits)	20:53
fungi	are your jobs setting up swap memory? if not, that will lead to oom faster than you expect	20:53
clarkb	fungi: I think we do that in the base job now?	20:54
clarkb	but it might be devstack specific?	20:54
anticw	fungi: swap will cause k8s to cry, though can be worked around	20:55
clarkb	usually you want swap not because you expect the job will succeed, but because swap will allow you to get the necssary data to diagnose problems that happen when memory runs out	20:55
fungi	wow, really? kubernetes is allergic to swap memory? that seems strange	20:56
clarkb	fungi: likely as much as anything else is like mysql or kvm	20:56
clarkb	things will get really slow and stop working within timeouts	20:56
fungi	i mean, obviously you don't want active tasks paging out memory they're still accessing, but it can give you breathing room for other background processes to get paged out	20:56
anticw	kinda, there is a long thread/debate about it and i'm just gonna get angry if i get into it :)	20:57
fungi	fair! ;)	20:57
*** hashar has quit IRC		20:57
anticw	i also had someone back into me an hour ago so am a bit crouchy	20:57
anticw	into my car i mean	20:57
clarkb	in any case I think the short term answer here is it would be great to get more log data if possible. Understanding what resource contentions you do have so that we can at least attempt to address them would be good	20:58
anticw	i like the idea of a long running dstat ... or netstat	20:59
anticw	i think that would be useful	20:59
anticw	and some sort of networking sanity checker	20:59
anticw	it might also be we're just asking too much from the VMs and should move entirely too a 3rd party gate (if possible)	20:59
clarkb	yup that is possible too, but hard to say without data like ^	21:00
anticw	clarkb: well, one old data point i that when i run a test locally it needs over 8GB ... how it even runs on the gates i'm not sure	21:00
clarkb	as for third party gating I think your hack is the cloeset you will get. Zuul has to gate its config changes	21:00
anticw	we would still have to wait for things to work through the queue though	21:01
anticw	even if we fall out in 20s	21:01
clarkb	anticw: jobs that just want to do an http request don't need to use a nodest with nodes. They can run directly on the executor	21:01
clarkb	it is a very constrained environment and we use it for stuff like retrigger read the docs builds	21:02
anticw	clarkb: yeah, but we might not know if we were able to test it in some cases	21:02
clarkb	also tripleo has a plan to reduce test resource needs as well as make tests more reliable. Here is hoping that improves the demand sideo f things for them	21:02
anticw	it would require us to have thorough and robust external gates, i was thinking more "if we can..."	21:03
*** trown is now known as trown\|outtypewww		21:03
openstackgerrit	Salvador Fuentes Garcia proposed openstack-infra/project-config master: Remove Fedora Job for Kata project https://review.openstack.org/613690	21:03
clarkb	fuentess: thanks, I went ahead and approved it	21:03
fuentess	clarkb: thank you	21:04
*** jamesdenton has quit IRC		21:11
*** fuentess has quit IRC		21:12
*** eharney has quit IRC		21:12
*** jamesmcarthur has joined #openstack-infra		21:13
fungi	wow, even the new zuul status ui is taking a while to load for me	21:22
fungi	tripleo changes currently account for 75% of the changes in the gate pipeline	21:23
*** jamesmcarthur has quit IRC		21:24
fungi	and roughly a third of them are indicating job failures	21:24
fungi	or merge conflicts or dependency on a failed change	21:25
fungi	i should say roughly a third of the changes near the top of their gate queue anyway	21:25
fungi	looks like the wait for node requests in the check pipeline is a little over 5 hours at this point	21:26
*** boden has quit IRC		21:26
fungi	we're hovering around 700 nodes in use at the moment	21:33
fungi	with another ~100 building/deleting	21:34
*** jbadiapa has quit IRC		21:34
clarkb	ya thats about right with inap disabled	21:34
fungi	and still down half of ovh right?	21:34
clarkb	no ovh is up	21:36
clarkb	and rarely leaking ports in gra1	21:37
*** rlandy has quit IRC		21:41
*** mriedem has quit IRC		21:52
*** armax has quit IRC		21:53
*** jlvillal has joined #openstack-infra		22:09
*** ansmith_ has joined #openstack-infra		22:16
openstackgerrit	Alex Schultz proposed openstack-infra/project-config master: Add noop to instack-undercloud https://review.openstack.org/613630	22:19
*** tpsilva has quit IRC		22:27
*** bobh has joined #openstack-infra		22:28
*** armax has joined #openstack-infra		22:59
*** agopi has quit IRC		23:02
*** pcaruana has quit IRC		23:10
*** rh-jelabarre has quit IRC		23:18
*** diablo_rojo has quit IRC		23:40
*** Swami has quit IRC		23:43
*** rcernin has joined #openstack-infra		23:43
*** jesusaur has quit IRC		23:45
*** smarcet has quit IRC		23:45
*** rcernin has quit IRC		23:51
*** kgiusti has left #openstack-infra		23:51
*** gyee has quit IRC		23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!