Friday, 2013-10-11

*** mrodden has quit IRC		00:01
clarkb	mordred: I left a comment on the change, I think I managed to express my concern properly, but let me know if it isn't clear	00:02
*** vipul is now known as vipul-away		00:04
*** openstackgerrit has quit IRC		00:04
*** openstackgerrit has joined #openstack-infra		00:05
*** pcm_ has quit IRC		00:06
*** krtaylor has joined #openstack-infra		00:07
*** weshay has quit IRC		00:09
*** mrodden has joined #openstack-infra		00:14
openstackgerrit	Dan Bode proposed a change to openstack-infra/config: Add stackforge project: puppet_openstack_builder https://review.openstack.org/51079	00:15
*** alexpilotti has quit IRC		00:15
clarkb	mordred: https://review.openstack.org/#/c/33926/5 if I +2 that do you want to babysit an approval?	00:15
clarkb	I need to drop offline here for a bit in order to get move stuff done prior to the weekend	00:16
* clarkb AFKs to do that. I did +2 the change. I think it just needs a sanity check once in so that the next gerrit restart doesn't go sideways		00:17
*** melwitt has quit IRC		00:17
*** vipul-away is now known as vipul		00:17
*** dripton has joined #openstack-infra		00:17
*** melwitt has joined #openstack-infra		00:19
*** alchen99 has quit IRC		00:20
openstackgerrit	A change was merged to openstack-infra/config: Document how to delete a pad from Etherpad Lite https://review.openstack.org/46329	00:20
*** CaptTofu has quit IRC		00:21
*** CaptTofu has joined #openstack-infra		00:21
*** hogepodge has quit IRC		00:22
openstackgerrit	Dan Bode proposed a change to openstack-infra/config: Add stackforge project: puppet_openstack_builder https://review.openstack.org/51079	00:22
*** amotoki has joined #openstack-infra		00:23
openstackgerrit	A change was merged to openstack-infra/jenkins-job-builder: Add repo scm https://review.openstack.org/45165	00:23
*** dripton has quit IRC		00:25
*** matsuhashi has joined #openstack-infra		00:27
*** senk has joined #openstack-infra		00:27
*** oubiwann_ has quit IRC		00:27
openstackgerrit	A change was merged to openstack-infra/devstack-gate: Improve fallback to master branch https://review.openstack.org/49894	00:27
openstackgerrit	A change was merged to openstack-infra/devstack-gate: Revert "Revert "Enable q-vpn service"" https://review.openstack.org/50242	00:27
openstackgerrit	A change was merged to openstack-infra/devstack-gate: Conditionally override PyPI for reqs integration https://review.openstack.org/50198	00:27
*** dripton has joined #openstack-infra		00:33
*** gyee has quit IRC		00:35
*** dripton has quit IRC		00:42
*** sandywalsh_ has joined #openstack-infra		00:42
*** sandywalsh has quit IRC		00:43
*** nosnos has joined #openstack-infra		00:44
*** dripton has joined #openstack-infra		00:45
*** krtaylor has quit IRC		00:57
*** sarob has joined #openstack-infra		00:58
*** senk has quit IRC		01:02
*** wenlock has quit IRC		01:07
*** sarob has quit IRC		01:08
*** sarob has joined #openstack-infra		01:09
*** melwitt has quit IRC		01:12
*** DennyZhang has joined #openstack-infra		01:12
*** senk has joined #openstack-infra		01:14
stevebaker	hey, is there some permissions I need to review heat proposals on http://summit.openstack.org/ ?	01:18
*** markmcclain has joined #openstack-infra		01:23
*** mriedem has joined #openstack-infra		01:24
lifeless	PTL	01:26
*** yaguang has joined #openstack-infra		01:27
*** yaguang has quit IRC		01:27
*** yaguang has joined #openstack-infra		01:28
*** basha has joined #openstack-infra		01:31
*** senk has quit IRC		01:39
*** chris613 has quit IRC		01:48
*** guohliu has quit IRC		01:49
*** jhesketh__ has quit IRC		01:57
*** wenlock has joined #openstack-infra		02:01
*** jhesketh has joined #openstack-infra		02:02
*** ArxCruz has joined #openstack-infra		02:05
*** dkranz has joined #openstack-infra		02:08
*** ArxCruz_ has joined #openstack-infra		02:12
*** fifieldt has joined #openstack-infra		02:13
*** xchu has joined #openstack-infra		02:14
*** ArxCruz has quit IRC		02:15
*** sarob has quit IRC		02:15
*** ArxCruz_ has quit IRC		02:20
*** alchen99 has joined #openstack-infra		02:24
*** krtaylor has joined #openstack-infra		02:26
*** alchen99 has quit IRC		02:36
*** senk has joined #openstack-infra		02:40
*** guohliu has joined #openstack-infra		02:43
*** locke105 has quit IRC		02:44
*** crank has quit IRC		02:44
*** kpepple has quit IRC		02:44
*** alaski has quit IRC		02:44
*** mkerrin has quit IRC		02:44
*** guitarzan has quit IRC		02:44
*** Reapster has quit IRC		02:44
*** Vivek has quit IRC		02:44
*** davidlenwell has quit IRC		02:44
*** BobBall has quit IRC		02:44
*** Ng has quit IRC		02:44
*** alaski_ has joined #openstack-infra		02:44
*** BobBall has joined #openstack-infra		02:44
*** guitarzan has joined #openstack-infra		02:44
*** Reapster has joined #openstack-infra		02:44
*** crank has joined #openstack-infra		02:44
*** Vivek has joined #openstack-infra		02:44
*** kpepple has joined #openstack-infra		02:44
*** Ng has joined #openstack-infra		02:44
*** locke105 has joined #openstack-infra		02:44
*** senk has quit IRC		02:44
*** Vivek is now known as Guest86586		02:45
*** mkerrin has joined #openstack-infra		02:45
*** davidlenwell has joined #openstack-infra		02:45
*** erfanian has joined #openstack-infra		02:49
*** mriedem has quit IRC		02:49
*** matsuhashi has quit IRC		02:57
lifeless	mordred: you might care about https://bugs.launchpad.net/tripleo/+bug/1222306	02:57
uvirtbot	Launchpad bug 1222306 in tripleo "can't install keystone with pypi mirror" [Medium,Triaged]	02:57
lifeless	mordred: or https://bugs.launchpad.net/tripleo/+bug/1222308	02:57
uvirtbot	Launchpad bug 1222308 in tripleo "can't install cinderclient with pypi mirror" [Medium,Triaged]	02:57
*** HenryG has joined #openstack-infra		02:58
clarkb	lifeless: we really should require <0.8alpha or whatever the lowest 0.8 version is	02:59
lifeless	clarkb: of requests?	03:00
*** basha has quit IRC		03:00
clarkb	lifeless: sqlalchemy	03:00
clarkb	its silly we can't just say <0.8	03:01
mordred	ah. fascinating	03:01
mordred	clarkb: we can with pip 1.4	03:01
lifeless	clarkb: oh right, there are two distinct bugs	03:01
clarkb	mordred: right, but everyone else doesn't do new pip	03:01
lifeless	mordred: yeah, I found this testing --offline with a fresh mirror	03:01
lifeless	mordred: so this is in the 'stuff we don't mirror in' category	03:01
lifeless	the problem is global requirements doesn't list all the different requirements all releases of clients had	03:02
mordred	lifeless: yah. https://review.openstack.org/#/q/topic:openstack/requirements,n,z	03:02
clarkb	mordred: https://review.openstack.org/#/c/51053/	03:03
mordred	clarkb: I think we have a script bug: https://review.openstack.org/#/c/49201/	03:03
mordred	look at the commit message	03:03
*** flaper87\|afk has quit IRC		03:03
*** flaper87\|afk has joined #openstack-infra		03:03
clarkb	we do, 51053 fixes it :)	03:03
*** mkerrin has quit IRC		03:03
*** mkerrin has joined #openstack-infra		03:03
*** HenryG has quit IRC		03:03
*** HenryG has joined #openstack-infra		03:03
lifeless	mordred: I'm not sure how that will fix the issue	03:03
lifeless	mordred: we're installing releases	03:03
mordred	done	03:04
mordred	what?	03:04
lifeless	mordred: when we pip install nova trunk	03:04
lifeless	mordred: we get a release of python-neutronclient	03:04
mordred	yah	03:04
mordred	k	03:04
* mordred bats eyelashes		03:04
lifeless	mordred: if the current requirements rules don't bring down versions that match the requirements when the release of that client was cut	03:05
mordred	all of the projects should merge all of those changes and then cut releases	03:05
clarkb	mordred: thanks. I also made sure to document why that horrible read into a variable trick is used	03:05
mordred	hrm. ok	03:05
mordred	lifeless: I grok what you are saying	03:05
clarkb	because I keep forgetting why we did that and I don't want to have to remember	03:05
lifeless	mordred: I don't claim to have an answer yet	03:05
lifeless	mordred: just thought you should have it in your thinking cap	03:05
openstackgerrit	A change was merged to openstack-infra/config: Use a single change ID per requirement proposal. https://review.openstack.org/51053	03:05
mordred	lifeless: I think this may fall in to the category of things that jeblair was worried about in terms of enabling use of our mirror for non-gate activities	03:05
mordred	lifeless: which is to say, I think it may have some design holes	03:06
lifeless	mordred: we're not using your mirror yet	03:06
lifeless	mordred: this is a fresh run-mirror'd mirror	03:06
mordred	lifeless: yup. I grok. but the mirror script is designed to keep a running mirror	03:06
lifeless	mordred: right, ack.	03:06
mordred	lifeless: thinking cap on - btw	03:06
mordred	this is my way of thinking	03:06
lifeless	once we get sophisticated enough in our CI	03:07
lifeless	we'll spin up new mirrors as part of the test	03:07
lifeless	and detect this	03:07
mordred	I will be honest - my most recent thinking has been to investigate use of devpi	03:07
lifeless	s/the test/a test/	03:07
lifeless	mordred: fully offline is very attractive for dc bringup stories	03:07
mordred	yup. devpi has fully offline	03:07
lifeless	mordred: so I'm not super keen on devpi	03:07
lifeless	mordred: I thought it only captured what you used?	03:08
mordred	it also has pockets	03:08
mordred	so you can have a "mirror upstream" pocket, and a "my local stuff" which depends on the "mirror upstream"	03:08
lifeless	mordred: so devpi would demonstrate the same failure mode	03:08
mordred	so pointing at my local stuff will get you both	03:08
mordred	lifeless: yes. I'm just saying	03:08
lifeless	ok, tangent, sure.	03:08
mordred	I've been thinking that richer implementation scripting might be better served at this point by devpi instead of pypi-mirror	03:09
mordred	BUT	03:09
mordred	I support the goal you are expressing	03:09
lifeless	cool	03:09
mordred	ish	03:09
mordred	sort of	03:09
mordred	I mean	03:09
mordred	yeah	03:09
lifeless	so I suspect we're going to be gating a different scenario than the gate currently does	03:09
mordred	yup	03:09
lifeless	I'm thinking I should mail the list when we're in sight of success	03:09
lifeless	and get discussion	03:09
lifeless	and/or a session in the CFP at the project level I guess	03:09
mordred	oy	03:10
clarkb	mordred: are you thinking we should use devpi for our mirror too?	03:10
*** dims has quit IRC		03:10
mordred	clarkb: toying with the idea	03:10
mordred	clarkb: the fact that it support multiple sets of things	03:10
mordred	clarkb: and local uploads	03:10
mordred	but also linking things	03:10
mordred	is very attractive	03:11
mordred	downside: it serves things from pyton instead of apache	03:11
clarkb	right, I was just going to ask about that	03:11
mordred	yup. that's the assinine part	03:11
mordred	but also the part that allows you to describe sets that depend on other sets	03:11
mordred	so, you know, feature. bug.	03:11
mordred	also - I'm thrilled that 3rd party testing has finally caught on	03:13
mordred	it only took a year	03:13
mordred	maybe a year and a half	03:13
clarkb	mordred: so I was thinking about swift logs and realized we should just put our mirror in swift too	03:13
mordred	how long have we been doing this?	03:13
mordred	clarkb: totes	03:13
clarkb	mordred: then we can manage a single index.html file	03:13
clarkb	and maybe not even that	03:13
clarkb	mordred: nova is requiring it for their hypervisors	03:14
clarkb	mordred: I think ssh will always be the way to go for third party testing (because event stream > polling)	03:15
*** wenlock_ has joined #openstack-infra		03:18
*** wenlock has quit IRC		03:19
*** wenlock_ is now known as wenlock		03:19
mordred	++	03:20
mordred	amazing how russellb telling people they have to do it or they're going to get dropped gets further than us offering that they can do it and people can track the quality of their driver	03:20
*** matsuhashi has joined #openstack-infra		03:24
*** matsuhashi has quit IRC		03:31
*** matsuhashi has joined #openstack-infra		03:32
*** guitarzan has quit IRC		03:34
*** alaski_ has quit IRC		03:34
*** dkranz has quit IRC		03:34
*** jhesketh has quit IRC		03:34
*** nosnos has quit IRC		03:34
*** michchap has quit IRC		03:34
*** uvirtbot has quit IRC		03:34
*** Ryan_Lane has quit IRC		03:34
*** SlickNik has quit IRC		03:34
*** freyes has quit IRC		03:34
*** mkoderer has quit IRC		03:34
*** slong has quit IRC		03:34
*** guitarzan has joined #openstack-infra		03:34
*** alaski has joined #openstack-infra		03:34
*** freyes has joined #openstack-infra		03:34
*** mkoderer_ has joined #openstack-infra		03:34
*** SlickNik has joined #openstack-infra		03:34
*** dkranz has joined #openstack-infra		03:34
*** slong has joined #openstack-infra		03:34
*** jhesketh has joined #openstack-infra		03:34
*** nosnos has joined #openstack-infra		03:34
*** michchap has joined #openstack-infra		03:34
*** Ryan_Lane has joined #openstack-infra		03:35
*** Ryan_Lane has quit IRC		03:35
*** Ryan_Lane has joined #openstack-infra		03:35
*** matsuhashi has quit IRC		03:36
*** senk has joined #openstack-infra		03:41
*** matsuhashi has joined #openstack-infra		03:41
*** matsuhashi has quit IRC		03:41
*** matsuhashi has joined #openstack-infra		03:42
*** senk has quit IRC		03:45
*** basha has joined #openstack-infra		03:45
*** matsuhashi has quit IRC		03:46
*** CaptTofu has quit IRC		03:47
*** CaptTofu has joined #openstack-infra		03:48
*** matsuhashi has joined #openstack-infra		03:49
*** basha_ has joined #openstack-infra		03:50
*** basha has quit IRC		03:52
*** basha_ is now known as basha		03:52
*** basha has quit IRC		03:53
*** SergeyLukjanov has joined #openstack-infra		03:54
*** jerryz has quit IRC		04:01
*** wenlock has quit IRC		04:04
*** sarob has joined #openstack-infra		04:11
*** erfanian has quit IRC		04:14
*** D30 has joined #openstack-infra		04:20
openstackgerrit	Tom Fifield proposed a change to openstack-infra/config: Fix Doc Location for Transifex https://review.openstack.org/51112	04:21
clarkb	fifieldt: you around?	04:21
fifieldt	yessir clarkb	04:22
fifieldt	the sun is up and doing well	04:22
clarkb	fifieldt: cool. We would like to add Ironic to transifex and I figured I should figure out how you would like to go about adding new prjoects	04:22
fifieldt	right, yes, that proceedure should be documented	04:22
fifieldt	I take it you're most interested in the transifex side of things?	04:23
clarkb	I think I have sufficient permissions to do it, but didn't want to be sidestepping things	04:23
clarkb	fifieldt: right	04:23
clarkb	fifieldt: I can send an email or submit a bug or whatever is best for you	04:23
fifieldt	if you want, we can step through it now and just do it?	04:23
clarkb	sure	04:23
fifieldt	and I can update the wiki at the same time	04:23
fifieldt	so, we start in the OpenStack "organisation" on transifex	04:23
fifieldt	https://www.transifex.com/organization/openstack	04:23
fifieldt	at the top of the projects list is the "+ NEW" button	04:24
fifieldt	we type in a name, and description as appropriate	04:24
clarkb	yup, I have clicked the NEW button	04:24
fifieldt	and importantly: set the source language to English (en)	04:24
clarkb	fifieldt: and the name is the project less openstack/ ?	04:24
fifieldt	yes	04:24
fifieldt	the openstack organisation provides the openstack bit	04:25
fifieldt	choose "Permissive Open Source" as the license	04:25
fifieldt	and paste the URL for the source (either github or git.openstack.org) in the "source code URL" box	04:25
fifieldt	once you have created the project, go to its page and click the "Manage" button	04:26
clarkb	fifieldt: does the URL for the source need to be a clonable path?	04:26
clarkb	or is that just a handy link for humans?	04:26
fifieldt	just a handy link for humans	04:26
clarkb	ok I am on the manage page	04:26
*** basha has joined #openstack-infra		04:27
fifieldt	feel free to fill out a long description, home page, if you want,	04:27
fifieldt	but the important bit here is maintainers	04:27
fifieldt	sorry	04:27
fifieldt	not maintainers	04:27
fifieldt	access control	04:27
fifieldt	set the "Project Type" to "Outsourced project"	04:27
clarkb	fifieldt: under features is a TM check box. should I check that?	04:27
fifieldt	and "Outsource Access to" OpenStack	04:27
fifieldt	yes, that is a good idea clarkb	04:28
clarkb	ok TM check box checked and project outsourced to openstack	04:28
fifieldt	great	04:28
clarkb	now I need to add maintainers	04:28
fifieldt	in theory, that is done through the OpenStack organisation	04:29
clarkb	oh	04:29
fifieldt	but you can add anyone you think is relevent to an individual project	04:29
clarkb	fifieldt: can you check if you have management perms on Ironic?	04:29
clarkb	you haven't been explicitly added but are part of the project hub	04:29
fifieldt	I do indeed	04:29
fifieldt	so no problems with permissions	04:29
clarkb	cool I will leave it as is then	04:29
fifieldt	yay :)	04:29
clarkb	is that it for the transifex side?	04:30
fifieldt	yes	04:30
clarkb	awesome thanks	04:30
fifieldt	well	04:30
fifieldt	there is one thing I'm not 100% sure of	04:30
fifieldt	that is whether there's a need to manually create the "Resources" the first time	04:30
fifieldt	I think the client can do that	04:30
fifieldt	but I'm not 100% sure	04:30
clarkb	I think the client can do that too	04:30
fifieldt	great	04:30
fifieldt	then yes, that should be everything	04:31
clarkb	as other new projects haven't needed to do anything under resources, instead jenkins jobs push to them and they are automagically added	04:31
fifieldt	excellent	04:31
fifieldt	it's good to get confirmation on that	04:31
clarkb	fifieldt: I will try to remeber and double check ironic once the jenkins jobs are in place	04:31
fifieldt	cheers	04:31
clarkb	but I haven't heard complaining about it not working so it must work right? :)	04:31
fifieldt	right :)	04:32
fifieldt	https://review.openstack.org/#/c/51112 <-- though, speaking of failing jenkins jobs, how do you feel about this? :) I'd like to get manuals working again :(	04:32
clarkb	devananda: ^ you are ready for the jenkins jobs	04:32
clarkb	fifieldt: 51112 lgtm +2'd	04:32
fifieldt	cheers	04:32
* fifieldt wonders who else he can bother at this insane timezone		04:33
*** markmcclain has quit IRC		04:38
fifieldt	dammit clarkb, now I have to check every project to make sure that TM box is ticked :D	04:40
fifieldt	it must be a new option	04:42
fifieldt	they weren't	04:42
fifieldt	nice job on the discovery :)	04:42
*** senk has joined #openstack-infra		04:42
*** DennyZhang has quit IRC		04:46
*** senk has quit IRC		04:47
clarkb	fifieldt: :)	04:47
*** changbl has quit IRC		04:47
*** changbl has joined #openstack-infra		04:51
*** DennyZhang has joined #openstack-infra		04:56
*** sarob has quit IRC		05:02
*** sarob has joined #openstack-infra		05:02
*** boris-42 has joined #openstack-infra		05:04
*** afazekas has joined #openstack-infra		05:06
*** sarob has quit IRC		05:07
*** SergeyLukjanov has quit IRC		05:08
*** afazekas has quit IRC		05:11
*** ryanpetrello has joined #openstack-infra		05:17
*** ryanpetrello has quit IRC		05:18
*** changbl has quit IRC		05:38
*** senk has joined #openstack-infra		05:43
*** senk has quit IRC		05:48
*** cody-somerville has quit IRC		05:50
*** yaguang has quit IRC		05:50
*** kong has quit IRC		05:57
*** Lingxian has joined #openstack-infra		05:58
openstackgerrit	Endre Karlson proposed a change to openstack-infra/config: Add pypi job to python-libraclient https://review.openstack.org/51069	06:05
*** DennyZhang has quit IRC		06:08
*** yolanda has joined #openstack-infra		06:11
*** sarob has joined #openstack-infra		06:13
openstackgerrit	Endre Karlson proposed a change to openstack-infra/config: Add / Change python-libraclient jobs https://review.openstack.org/51069	06:17
*** sarob has quit IRC		06:18
*** mkoderer_ is now known as mkoderer		06:37
*** senk has joined #openstack-infra		06:45
*** senk has quit IRC		06:50
*** uvirtbot has joined #openstack-infra		06:52
*** mkerrin has quit IRC		06:55
*** yamahata has joined #openstack-infra		06:58
*** mkerrin has joined #openstack-infra		07:00
*** cody-somerville has joined #openstack-infra		07:09
*** cody-somerville has quit IRC		07:09
*** cody-somerville has joined #openstack-infra		07:09
*** mancdaz_ has quit IRC		07:14
*** slong has quit IRC		07:15
*** mancdaz has joined #openstack-infra		07:15
openstackgerrit	Masashi Ozawa proposed a change to openstack/requirements: Set boto minimum version https://review.openstack.org/51131	07:15
*** cody-somerville has quit IRC		07:16
openstackgerrit	Masashi Ozawa proposed a change to openstack/requirements: Set boto minimum version https://review.openstack.org/51131	07:17
*** D30 has quit IRC		07:17
*** D30 has joined #openstack-infra		07:22
*** bauzas has joined #openstack-infra		07:22
bauzas	hi all	07:22
bauzas	I'm having trouble with the py27 build for a review : http://logs.openstack.org/70/50970/1/check/gate-climate-python27/5ede61d/console.html	07:23
bauzas	my own tow -r -epy27 works like a charm	07:23
bauzas	s/tow/tox	07:23
bauzas	but the oslo config on the Jenkins VM is incorrect	07:24
bauzas	I checked both Jenkins and tox venvs	07:24
bauzas	and the pip freeze is slighly different	07:24
*** fbo_away is now known as fbo		07:25
bauzas	oslo.config is the same 1.2.1	07:25
bauzas	but I found trace of oslo-config on Jenkins	07:26
*** osanchez has joined #openstack-infra		07:26
bauzas	which is an early build	07:26
*** D30 has quit IRC		07:27
*** dafter has joined #openstack-infra		07:29
*** cody-somerville has joined #openstack-infra		07:30
*** shardy_afk is now known as shardy		07:31
*** D30 has joined #openstack-infra		07:32
*** cody-somerville has quit IRC		07:37
*** senk has joined #openstack-infra		07:46
*** senk has quit IRC		07:51
*** basha has quit IRC		07:59
*** che-arne has joined #openstack-infra		08:01
*** luhrs1 has joined #openstack-infra		08:01
*** jpich has joined #openstack-infra		08:05
*** dizquierdo has joined #openstack-infra		08:06
*** odyssey4me has joined #openstack-infra		08:06
*** yassine has joined #openstack-infra		08:08
*** amotoki has quit IRC		08:11
*** derekh has joined #openstack-infra		08:18
*** odyssey4me has quit IRC		08:26
*** markmc has joined #openstack-infra		08:29
*** odyssey4me has joined #openstack-infra		08:33
*** dizquierdo has quit IRC		08:38
*** hashar has joined #openstack-infra		08:41
*** dims has joined #openstack-infra		08:42
*** dkehn_ has joined #openstack-infra		08:44
*** yamahata has quit IRC		08:46
*** dkehn has quit IRC		08:47
*** senk has joined #openstack-infra		08:47
*** dims has quit IRC		08:50
*** senk has quit IRC		08:52
openstackgerrit	Lucas Alvares Gomes proposed a change to openstack/requirements: Added lower version boundary for netaddr https://review.openstack.org/49530	08:55
*** dizquierdo has joined #openstack-infra		09:08
openstackgerrit	Masashi Ozawa proposed a change to openstack/requirements: Set boto minimum version https://review.openstack.org/51131	09:12
sileht	thx fungi I have seen the pypi mirror updated !	09:13
*** johnthetubaguy has joined #openstack-infra		09:19
*** basha has joined #openstack-infra		09:22
*** johnthetubaguy has quit IRC		09:31
*** johnthetubaguy has joined #openstack-infra		09:31
*** beagles has joined #openstack-infra		09:45
openstackgerrit	Mehdi Abaakouk proposed a change to openstack-infra/jenkins-job-builder: Allow macro is dict key https://review.openstack.org/51159	09:46
*** alexpilotti has joined #openstack-infra		09:46
*** senk has joined #openstack-infra		09:48
*** senk has quit IRC		09:52
*** markmc has quit IRC		10:02
*** xchu has quit IRC		10:04
*** alexpilotti has joined #openstack-infra		10:07
*** pcm_ has joined #openstack-infra		10:07
*** pcm_ has quit IRC		10:09
*** pcm_ has joined #openstack-infra		10:09
*** markmc has joined #openstack-infra		10:11
*** D30 has quit IRC		10:12
* ttx juggles with CIVS since it doesn't allow more than 1000 voters		10:17
*** fifieldt has quit IRC		10:18
ttx	Fun fact: there is one voter that was left out by CIVS for a mysterious reason and I have no way of determining who it is.	10:18
*** fifieldt has joined #openstack-infra		10:18
ttx	fungi, jeblair, mordred: multiple failures downloading deps on various jobs	10:23
ttx	http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html	10:24
ttx	looks like network issues	10:24
ttx	doesn't hit the same dep every time	10:24
* ttx lunches		10:24
sdague	ttx: it doesn't allow more than 1000 voters?	10:26
*** branen has quit IRC		10:30
openstackgerrit	Endre Karlson proposed a change to openstack-infra/config: Add / Change python-libraclient jobs https://review.openstack.org/51069	10:31
*** dims has joined #openstack-infra		10:34
*** mestery has joined #openstack-infra		10:40
*** hashar_ has joined #openstack-infra		10:46
*** johnthetubaguy has quit IRC		10:47
*** johnthetubaguy has joined #openstack-infra		10:48
*** hashar has quit IRC		10:48
*** hashar_ is now known as hashar		10:48
*** mestery has quit IRC		10:48
*** senk has joined #openstack-infra		10:49
*** senk has quit IRC		10:53
*** boris-42 has quit IRC		10:55
*** guohliu has quit IRC		11:01
openstackgerrit	Qiu Yu proposed a change to openstack-infra/jeepyb: Print help message and exit if no config file by default https://review.openstack.org/51182	11:03
*** cody-somerville has joined #openstack-infra		11:13
soren	ttx: CIVS is free software, IIRC. You might be able to install it somewhere and crank that limit up to eleven... thousand.	11:17
*** michchap has quit IRC		11:26
*** michchap has joined #openstack-infra		11:26
*** CaptTofu has quit IRC		11:27
*** CaptTofu has joined #openstack-infra		11:27
*** cody-somerville has quit IRC		11:31
sdague	might need that for next go around. the ATC growth being what it is	11:32
*** hashar has quit IRC		11:34
ttx	soren: yes, it's a bit weird but I ran it locally recently to test the ability to rerun ballots with alternative algorithms	11:35
ttx	sdague: you can actually send voters in multiple batches of <1000	11:35
sdague	ah, gotcha	11:35
*** SergeyLukjanov has joined #openstack-infra		11:35
ttx	sdague: but i wasn't sure of that until I tried and already sent half of them :)	11:36
sdague	heh	11:37
*** SergeyLukjanov is now known as _SergeyLukjanov		11:37
*** _SergeyLukjanov is now known as SergeyLukjanov		11:37
openstackgerrit	Ekaterina Fedorova proposed a change to openstack-infra/config: Add murano-repository to stackforge https://review.openstack.org/50026	11:38
ttx	Err... the test nodes graph at http://status.openstack.org/zuul/ looks highly suspicious	11:41
ttx	fungi, jeblair, mordred: ^ may or may not be related with the network issues we're experiencing fetching deps	11:42
ttx	At this rate we'll reach universe entropy in 67 minutes	11:42
ttx	sdague: ever saw something like it ?	11:43
sdague	yeh, that looks crazy	11:44
sdague	I wonder if the network timeouts are preventing the node builds	11:44
sdague	which would make sense	11:44
ttx	sdague: yes, definitely started to appear at around the same time	11:44
sdague	so they enter that state, but stall out	11:44
sdague	and the system is correctly trying to build more, because it's not getting any out the other side	11:45
sdague	because we are definitely backed up on devstack nodes	11:45
ttx	it's like watching a train wreck in slow motion	11:45
ttx	good thing I got most of my patches merged earlier.	11:46
sdague	heh	11:46
sdague	who knew that skynet would need this much care and feeding	11:46
ttx	sdague: I was thinking of issuing a statusbot alert.	11:47
sdague	probably fair	11:47
ttx	on it	11:47
ttx	#status notice Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)	11:49
*** senk has joined #openstack-infra		11:49
ttx	I like how every time I need to use that bot it miserably fails	11:50
ttx	where the heck is openstackstatus bot	11:50
*** basha has quit IRC		11:50
*** senk has quit IRC		11:53
-ttx- Top issues right now: (1) test node starvation (2) networking issues fetching dep (might be the cause of 1) and (3) no statusbot to warn people		11:54
ttx	fungi, jeblair, mordred: ^	11:55
*** thomasm has joined #openstack-infra		11:55
*** thomasm has quit IRC		11:55
*** thomasm has joined #openstack-infra		11:56
openstackgerrit	Tom Fifield proposed a change to openstack-infra/config: Fix Doc Location for Transifex https://review.openstack.org/51112	11:56
ttx	sdague: wondering if we are not past the peak of network issues and starting to gradually recover	11:58
ttx	looking at the graph and the status of the very few tests that run	11:58
sdague	yeh, could be	11:59
*** boris-42 has joined #openstack-infra		11:59
*** basha has joined #openstack-infra		12:00
sdague	so in one of the ways to make skynet smarter, I wonder if we should consider auto respooling checks that hit a network timeout	12:01
ttx	sdague: that wouldn't make it smarter, but would certainly make it more resilient	12:02
sdague	yeh	12:02
*** basha has quit IRC		12:02
*** markmc has quit IRC		12:02
*** dizquierdo has quit IRC		12:03
*** w_ has joined #openstack-infra		12:04
*** markmc has joined #openstack-infra		12:05
*** CaptTofu has quit IRC		12:05
*** olaph has quit IRC		12:05
*** CaptTofu has joined #openstack-infra		12:06
*** dprince has joined #openstack-infra		12:10
*** cody-somerville has joined #openstack-infra		12:22
mordred	yay! things have fixed themselves before I woke up?	12:23
BobBall	they knew you were coming	12:23
BobBall	and were scared...	12:23
mordred	BobBall: ++	12:24
thomasm	'Tis a good day.	12:26
*** thomasbiege has joined #openstack-infra		12:30
*** dcramer_ has quit IRC		12:31
*** adalbas has joined #openstack-infra		12:32
*** matsuhashi has quit IRC		12:33
*** matsuhashi has joined #openstack-infra		12:34
*** aspiers has quit IRC		12:34
*** nosnos has quit IRC		12:35
*** nosnos has joined #openstack-infra		12:35
*** aspiers has joined #openstack-infra		12:38
openstackgerrit	Roman Podolyaka proposed a change to openstack-infra/config: Fix sqlalchemy-migrate py26/sa07 job https://review.openstack.org/44686	12:38
*** matsuhashi has quit IRC		12:38
*** dafter has quit IRC		12:40
*** nosnos has quit IRC		12:40
*** dafter has joined #openstack-infra		12:41
ttx	mordred: no	12:41
*** weshay has joined #openstack-infra		12:41
ttx	mordred: start by the scary "test nodes" graph @ http://status.openstack.org/zuul/	12:41
ttx	mordred: then look at download fails @ http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html	12:42
ttx	(might be same issue around networking)	12:42
*** hashar has joined #openstack-infra		12:42
ttx	mordred: then finally, where the heck is statusbot when you need it ?	12:42
ttx	mordred: gate is totally wedged right now.	12:43
fifieldt	that looks awesome	12:44
bauzas	sdague: ping ?	12:44
fifieldt	but the amount of scrolling was annoying to get to the graphs ;)	12:44
ttx	fifieldt: if I didn't need it urgently for RC2 production I would probably find it funny too	12:44
bauzas	sdague: I'm now at my office, still broken about my oslo.config version	12:44
bauzas	btw, maybe ppl could help me ?	12:45
fifieldt	sorry ttx :) 2345 here and the brain is off, it seems	12:45
* ttx sees his weekend vanish		12:45
bauzas	http://logs.openstack.org/70/50970/1/check/gate-climate-python27/5ede61d/console.html	12:45
bauzas	oslo-config got pulled from Jenkins while it shouldn't	12:45
bauzas	my own tox venv on my laptop doesn't get this pretty old oslo-config beta version	12:45
*** dafter has quit IRC		12:46
bauzas	the gate should be fine	12:46
mordred	why are we timing out on fetches from pypi.o.o ?	12:46
*** jhesketh has quit IRC		12:46
ttx	mordred: you tell me	12:46
*** openstackstatus has joined #openstack-infra		12:47
mordred	ok. there's statusbot	12:47
ttx	yay, a bot	12:47
ttx	#status notice Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)	12:48
openstackstatus	NOTICE: Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)	12:48
*** basha has joined #openstack-infra		12:48
openstackgerrit	Renat Akhmerov proposed a change to openstack-infra/config: Add configuration for Mistral project https://review.openstack.org/51205	12:49
*** dhouck_ has joined #openstack-infra		12:50
* ttx goes to get some fresh air		12:50
openstackgerrit	Emilien Macchi proposed a change to openstack-infra/config: Add IRC bot on #openstack-rally for Gerrit changes https://review.openstack.org/51207	12:58
*** dkehn_ is now known as dkehn		12:58
*** basha has quit IRC		12:58
*** CaptTofu has quit IRC		13:00
*** CaptTofu has joined #openstack-infra		13:00
ttx	mordred: fwiw we might be past the peak of networking issues and slowly recovering	13:02
yolanda	hi, i'm trying to create users automatically in gerrit, they are created, correctly assigned to groups, but when i click on their links (aka /#/dashboard/xxxx), it shows me a not found page, what could be the issue there?	13:02
ttx	mordred: there are a few successful test runs by now, a few hours earlier they were all failing	13:03
yolanda	i can see the /dashboard/ url for the logged user, but not for others, although i'm logged with an admin user	13:03
ttx	mordred: hard to tell more from where I stand	13:03
*** miqui has joined #openstack-infra		13:05
* mordred now useless and on the phone once more		13:05
*** blamar has joined #openstack-infra		13:07
*** michchap has quit IRC		13:11
*** michchap has joined #openstack-infra		13:14
*** michchap has quit IRC		13:16
*** julim has joined #openstack-infra		13:16
*** sandywalsh_ has quit IRC		13:16
openstackgerrit	A change was merged to openstack-infra/elastic-recheck: Fix test_files_at_url_pass https://review.openstack.org/50706	13:16
*** DennyZhang has joined #openstack-infra		13:16
*** mriedem has joined #openstack-infra		13:18
*** basha has joined #openstack-infra		13:19
fungi	having a look	13:22
ttx	fungi: network issues preventing dep fetching, potentially also the cause of test nodes starvation	13:22
ttx	(executive summary)	13:23
ttx	see scary "test nodes" graph @ http://status.openstack.org/zuul/ and example dep fetching fail @ http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html	13:23
*** matty_dubs\|gone is now known as matty_dubs		13:25
fungi	yeah, looking at graphs and checking rackspace's network status info	13:25
*** thedodd has joined #openstack-infra		13:27
fungi	yeah, rs mentions no current issues and no posted maintenance for today	13:27
*** sandywalsh has joined #openstack-infra		13:28
sdague	mordred: having an issue with the cookiecutter repo - http://paste.openstack.org/show/48266/	13:32
fungi	mmm, the /srv/static/doc filesystem on static.o.o is slap full. not sure whether that's having an impact but i'll give it a little more breathing room	13:32
*** markmcclain has joined #openstack-infra		13:33
fungi	er, /srv/static/docs-draft (cacti truncated the label in its graph)	13:33
fifieldt	there have been many more doc patches than normal of late	13:33
*** russellb is now known as rustlebee		13:36
sdague	oh, never mind	13:37
fungi	i'm increasing it by about 25% for now and then we can discuss whether we purge drafts more aggressively or add still more space	13:37
sdague	mordred: it's probably a good idea to kill - https://github.com/emonty/cookiecutter-openstack as it is a high hit for openstack cookiecutter	13:37
*** dafter has joined #openstack-infra		13:38
*** dafter has quit IRC		13:38
*** dafter has joined #openstack-infra		13:38
mordred	sdague: ++	13:40
*** dizquierdo has joined #openstack-infra		13:41
fungi	so, on the nodepool.o.o graphs i see gaps around the time the server building volume increases there on the graph. either it went to lunch and stopped responding to snmp for ~20 minutes or there was a network blip (but i don't find gaps from the same time period for other hosts)	13:41
fungi	i'll start dinning into logs on the nodepool server	13:41
*** bnemec is now known as beekneemech		13:44
fungi	there were some errors in the nodepool image log around 0230 utc, but that's way earlier than the symptoms began and i don't see a recurrence there	13:45
ttx	fungi: is networking working now on those machines ?	13:45
fungi	seems fine at the moment. i've got a ping test going to static.o.o right now as well as a few devstack slaves	13:46
*** fifieldt has quit IRC		13:47
ttx	fungi: the test nodes building graph still goes up the roof	13:48
*** basha has quit IRC		13:48
fungi	ttx: yeah, hoping the logs will give me some inkling of why	13:48
*** beagles is now known as seagulls		13:49
ttx	fungi: our collective guess was that they stalled on dep loading	13:49
ttx	fungi: the issue might be gone now but they are still rpeventing new ones from being spun	13:49
*** aspiers has quit IRC		13:49
* ttx wonders how much radical killing would be a solution at this point		13:49
fungi	as of this week, nodepool will try to proactively build additional servers based on perceived demand for waiting jobs so that may be what we're seeing on the graphs	13:49
ttx	all I can say is thet status has not moved in gate for the last 5 hours	13:50
jd__	ttx: are you threating fungi? ;)	13:50
*** alaski is now known as lascii		13:50
fungi	but yes it could be a symptom of network issues in hpcloud, though i'm finding no evidence of that yet	13:50
ttx	as in.. same jobs are waiting for resources	13:51
ttx	so my guess is that no new test resources are made available. It's not slow, it's stuck	13:51
*** dcramer_ has joined #openstack-infra		13:51
fungi	nodepool's "alien" list (servers it sees but didn't create) is fairly long. not sure if that's a related symptom	13:52
*** prad_ has joined #openstack-infra		13:52
*** aspiers has joined #openstack-infra		13:53
fungi	but nodepool is definitely building and deleting servers based on what i see in its logs, and doesn't mention any serious issues so it could be a tuning problem	13:53
ttx	fungi: there hasn't been a devstack being run that I could see in the last.. 4 hours now	13:54
fungi	oh, nevermind. most of those are our other non-devstack slaves in rackspace	13:54
ttx	right	13:54
fungi	the alien nodes it lists, i mean	13:54
ttx	(we still get pep8 tests run)	13:54
*** pabelanger has joined #openstack-infra		13:55
ttx	4h20min to be precise	13:55
ttx	but the networking issue is gone in the latest non-devstack runs we see	13:56
fungi	there are definitely still some devstack jobs running... https://jenkins01.openstack.org/job/check-tempest-devstack-vm-full/1954/console	13:56
*** sarob has joined #openstack-infra		13:57
fungi	ahh, here we go	13:57
ttx	fungi: yes, a dozen of them in the check line	13:57
ttx	none in the gate	13:57
*** sarob has joined #openstack-infra		13:57
fungi	there are no hpcloud slaves in jenkins, only rackspace	13:58
fungi	that arms me with something more i can look for	13:58
*** DennyZhang has quit IRC		14:01
ttx	sigh, looks like a busy saturday coming up for me	14:01
*** marun has quit IRC		14:02
*** DennyZhang has joined #openstack-infra		14:02
jd__	ttx: yup, i'll try to be available too if you want to handle Ceilometer RC2 tomorrow	14:02
ttx	even if we solved it now the lines are so long I won't get the RC2 stuff in before eod	14:02
fungi	http://paste.openstack.org/show/48272/	14:02
ttx	and I have family visiting, yay	14:02
*** sarob has quit IRC		14:02
jd__	191 building? O_o is that normal?	14:03
fungi	most hpcloud nodes are in a building state and many deleting with very few ready, similar to the pverall nodes graph on the zuul status page	14:03
*** changbl has joined #openstack-infra		14:03
ttx	fungi: all our gate testing goes to hp nodes ?	14:03
fungi	well, i know we've said in the past that we throw away something like 75% of the slaves we build on hpcloud because after waiting a couple minutes for them to boot they never show up	14:03
*** SergeyLukjanov is now known as _SergeyLukjanov		14:04
fungi	ttx: yes, we have very few rackspace nodes (much lower quotas) and the slaves are slower	14:04
*** _SergeyLukjanov is now known as SergeyLukjanov		14:04
ttx	maybe HP asked all their servers to work back in their datacenters	14:04
fungi	lokking to see if i can figure out what's up with hpcloud and hopefully we can get this back on track	14:04
* ttx hesitates to cut new RC2s right now, fearing that pre-release jobs would get queued forever		14:06
*** hashar has quit IRC		14:11
*** yassine has quit IRC		14:13
fungi	#status alert The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC	14:14
openstackstatus	NOTICE: The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC	14:14
*** ChanServ changes topic to "The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC"		14:14
mordred	fungi: wow. we have no HP nodes?	14:14
fungi	mordred: well, we have a ton, but... we're not using them	14:14
mordred	hrm	14:14
fungi	mordred: http://paste.openstack.org/show/48273/	14:15
fungi	i'm hunting for any real error to explain it	14:15
fungi	we have a handful of expected errors in the nodepool log for things like timeouts deleting servers, but they're few and far between and not for several hours now	14:18
fungi	stuff that gets retried and would have errored again if it kept happening	14:18
annegentle_	node starvation sounds serious! Rooting for you guys.	14:20
fungi	annegentle_: thanks!	14:21
*** rahmu has quit IRC		14:24
fungi	the handful of devstack servers nodepool knows about in hpcloud are also not getting used. i just ssh'd into one of them and it had an uptime of 20 hours	14:25
fungi	ssh'd into one in a deleting state and it's got an uptime of 6 hours. i suspect delete (and maybe build?) calls are not being respected	14:26
*** yassine has joined #openstack-infra		14:28
*** blamar has quit IRC		14:28
fungi	novaclient itself shows the same server as "active" state	14:28
mordred	fungi: fantastic	14:29
mordred	fungi: anything I can do to help?	14:29
* mordred is off phone now		14:29
fungi	no idea. poke at things. i'm still casting my net wide	14:30
fungi	doing a nova delete of that "deleting" node seems to work	14:30
fungi	and nodepool is still listing that node in a "delete" state even after it's gone in hpcloud	14:31
mordred	fungi: that sounds very weird	14:31
fungi	maybe nodepool just hasn't noticed yet? (or maybe it doesn't expect anyone else to delete its nodes)	14:32
fungi	anyway, since none of the devstack-gate slaves in hpcloud are currently being used, i'm thinking maybe we delete them all and... nodepool is stateless, right? just restart it?	14:33
*** ruhe has joined #openstack-infra		14:35
fungi	but i'm uneasy going behind its back and making changes, restarting it and losing state which might help point us to the actual error, et cetera	14:35
mordred	yeah. I'm very shaky on doing things to nodepool without jeblair	14:40
*** wenlock has joined #openstack-infra		14:40
*** datsun180b has joined #openstack-infra		14:41
fungi	i only just noticed that one of the columns in nodepool list's output is age in hours. quite a few of the "building" nodes have an age over 4 hours	14:41
fungi	those are the oldest in that state and that's about the timeframe where we started seeing issues, judging from the graphs	14:42
*** cody-somerville has quit IRC		14:43
fungi	actually some almost 6 hours old	14:44
fungi	around 0850 utc	14:44
fungi	all the nodes in a nodepool delete state are not much older than that. maybe 40 minutes older, tops	14:46
fungi	from around 0825	14:46
*** cody-somerville has joined #openstack-infra		14:46
*** pentameter has joined #openstack-infra		14:47
openstackgerrit	Masashi Ozawa proposed a change to openstack/requirements: Set boto minimum version https://review.openstack.org/51131	14:48
fungi	so i think starting around thenish, hpcloud ceased acting on any nova delete or create calls. maybe nodepool lost a persistent connection and didn't realize/retry?	14:48
fungi	it has established https sockets (so it thinks) to addresses very similar to what the hpcloud service endpoint resolves to. sniffing now to see if those are actually dead connections	14:52
openstackgerrit	Qiu Yu proposed a change to openstack-infra/jeepyb: Print help message and exit if no config file by default https://review.openstack.org/51182	14:52
fungi	it's been a couple minutes already and i see no traffic at all to/from those addresses	14:53
*** rcleere has joined #openstack-infra		14:54
*** DennyZhang has quit IRC		14:54
*** basha has joined #openstack-infra		14:55
fungi	oho, so around 0850 nodepool did log this gem... ConnectionError: HTTPSConnectionPool(host='ord.servers.api.rackspacecloud.com', port=443): Max retries exceeded with url: /v2/637776/servers/a14b333a-9b03-48c8-b144-4f21a3eec405 (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)	14:55
fungi	nevermind. that was rackspace	14:55
fungi	pretty large uptick in ssh timeouts waiting for servers to launch around that timeframe too	14:58
*** dansmith is now known as Steely_Dan		14:59
mordred	spectacular	15:00
mordred	so have we perhaps exceeded another timeout threshold?	15:00
fungi	not sure. also i've been sniffing for any traffic to/from 168.87.243.0/24 (where the hpcloud service endpoint resolves into and where nodepool claims to have a couple established https sockets to remote systems) and so far not a single packet for over 10 minutes	15:02
fungi	probably much, much longer, but at least none since i started up tcpdump	15:02
*** boris-42 has quit IRC		15:03
*** basha has quit IRC		15:04
dkranz	mordred: I have a process going that is reading the console log for every successful tempest run (listening to gerrit) looking for reported bogus errors. Is that going to annoy any one?	15:05
*** blamar has joined #openstack-infra		15:05
*** cody-somerville has quit IRC		15:05
fungi	dkranz: unlikely. are you pulling those console logs from logs.openstack.org?	15:06
dkranz	fungi: Yes.	15:06
fungi	i didn't notice any huge uptick in outbound network utilization on it at any rate	15:07
dkranz	fungi: ok, cool. Just don't want to be a bad citizen...	15:07
dkranz	fungi: This will stop once we start failing builds that have bogus errors (or real ones).	15:07
fungi	dkranz: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=311&rra_id=all	15:07
*** thingee_zzz is now known as thingee		15:08
fungi	i mean, yes, it's a lot of traffic but it's not at worrying levels, i don't think	15:08
sdague	fungi / clarkb I'm respinning the htmlify-screen-logs.py into an os_loganalyze repository so I can do some sane test additions before upping the complexity for other log times	15:09
sdague	log types	15:09
fungi	though we do seem to have topped out at 100mbps briefly last week	15:09
sdague	should I just github this until good, then pull it back into openstack-infra?	15:09
fungi	sdague: whatever's easy for you. we can import later or you can start off with a basic cookiecutter	15:09
sdague	or would we want it as a gerrit repo earlier	15:09
sdague	fungi: yeh, I started with a cookiecutter	15:09
*** yassine has quit IRC		15:10
sdague	so it should play nicely later	15:10
fungi	i mean you can start out with your basic cookiecutter in gerrit or you can import it once it's usable--your call	15:10
sdague	I was amused the cookiecutter has 3 pep8 errors init	15:10
fungi	patches welcome!	15:10
sdague	yeh, I guess it's probably faster to get to working unit tests with me just committing and pushing	15:10
*** mkerrin has quit IRC		15:11
jeblair	sdague: have you kept up with the -infra thread on log storing/serving?	15:12
sdague	not as much as I probably should	15:13
ttx	jeblair!	15:13
sdague	sorry, it's been one of those weeks	15:13
jeblair	sdague: short: there's an idea that we might want to preprocess logs and statically serve them instead of using the wsgi app	15:13
sdague	jeblair: ok	15:13
jeblair	sdague: i don't think that invalidates any of your past or planned work, but if we decide to go that way, it may change how we use it a bit	15:13
sdague	so that wouldn't let us do the filtering that we're doing now, which is nice	15:13
ttx	jeblair: in case you're in "holy batman, what a backlog" mode, we are currently out of HP devstack node, effectively blocking the gate.. for the last 6 hours	15:14
sdague	the filtering being nice, that is	15:14
fungi	jeblair: thoughts on why nodepool is not talking to hpcloud since around 0825 utc (that's the best i've been able to pin details down so far)	15:14
jeblair	sdague: i think we'd only do that if we found a way to accomplish the goals we get by filtering; anyway, your input is very welcome.	15:14
jeblair	sdague: it's all a bit brainstormy right now -- nothing urgent	15:14
jeblair	fungi: i'll go look	15:14
sdague	sure, we going to do a session in HK?	15:15
ttx	jeblair: may or may not be related to network failures we noticed in test jobs fetching deps around the same time (which seem to be fixed now)	15:15
jeblair	sdague: lets	15:15
jeblair	fungi: nodepool has _extensive_ logging	15:15
sdague	that would be good brainstormy time for it. Right now I'd just like to get this to a realm where we aren't dropping all the keystone logs for logstash :)	15:15
fungi	jeblair: yes, i've been trying to make sense of the logging and correlate it to the behavior we're seeing	15:15
jeblair	sdague: yeah; one of the participants on the thread isn't going to make it to hk, so i'm trying to lay some groundwork over email	15:16
sdague	yep, no worries	15:16
sdague	who we going to miss in HK?	15:16
jeblair	sdague: jhesketh	15:16
sdague	ok, gotcha	15:16
fungi	it's not saying things like "i'm trying to build servers and they're never appearing (in fact it's not saying much of all--i think it's waiting for hours for them to become ready)	15:16
ttx	fungi: could the networking issues have caused permanent damage to that nodepool/HPcloud link ?	15:16
jeblair	fungi: yeah, it looks like they're all stuck in building, and errored out in such a way that the cleanup code failed	15:17
fungi	ttx: i'm not sure. i'm thinking maybe the tcp sockets to the service endpoint are actually dead and the nodepool server still believes them to be in an established state	15:17
*** sandywalsh has quit IRC		15:17
jeblair	fungi: if you run 'nodepool list' you'll see a lot of 'None' values in th edb	15:17
fungi	right, i definitely saw that	15:17
jeblair	like this: http://paste.openstack.org/show/48279/	15:18
fungi	i also saw the periodic cleanup error, but i expected it to periodically error if it was continuing to have problems, being a periodic cleanup	15:18
fungi	however, it only complained once, then was silent	15:18
jeblair	fungi: so for some reason, we set the cleanup delay for non-ready to 8 hours	15:18
jeblair	fungi: so it's going to wait another 2 hours before it starts deleting these	15:19
jeblair	so good news: it would probably fix itself in 2 hours. :)	15:19
jeblair	we should probably adjust that timing a bit.	15:19
fungi	got it. would have fixed itself while we slept if only it had started sooner	15:19
fungi	so what's the safest way to manually clean those up in the future?	15:20
jeblair	fungi: oh, i may have been wrong -- it may not have failed, you might be right -- it may actually have a couple hundred threads waiting for something	15:20
fungi	delete queries in the db?	15:20
jeblair	i want to spend a minute and try to find out if that's the case	15:20
fungi	certainly. i was very hesitant to disturb its current state lest i lose valuable evidence of the issue	15:21
jeblair	fungi: do you have any logged errors handy?	15:21
fungi	jeblair: not pasted yet, but i can do that	15:21
*** branen has joined #openstack-infra		15:22
jeblair	ttx: can you tell me about the networking issues?	15:23
*** beekneemech has quit IRC		15:23
*** sandywalsh has joined #openstack-infra		15:24
*** bnemec has joined #openstack-infra		15:24
ttx	jeblair: most tests suddenly started to fail with dep download errors like http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html	15:24
fungi	jeblair: nodepool tracebacks in the log from around the time this started (though quieted down to some ssh timeout errors and then nothing of note for hours): http://paste.openstack.org/show/48280/	15:25
*** bnemec has quit IRC		15:25
ttx	at around the same time, the "test nodes" graph on status/zuul started to drink heavily	15:25
jeblair	i note that we have servers stuck in build from both rax and hpcloud	15:26
jeblair	the 'waiting for deletion' timeouts are mostly for hpcloud, but there's one rax	15:27
ttx	jeblair: the starvation only appears to affect devstack/gate nodes	15:27
*** markmcclain has quit IRC		15:28
jeblair	i don't see anything on rax status, and nothing relevant on hpcloud status	15:28
fungi	jeblair: it might have been a network disruption local to the nodepool server	15:30
fungi	i saw a gap in its cacti graphs from that timeperiod, but couldn't correlate it to any other systems	15:30
*** rnirmal has joined #openstack-infra		15:32
jeblair	gdb says all the threads are sitting in sem_wait	15:33
jeblair	(well, most of them)	15:33
*** anteaya has joined #openstack-infra		15:33
*** markmc has quit IRC		15:33
jeblair	which is really weird because the one locking thing nodepool does is to use queue.Queue which handles all the locking internally	15:34
fungi	though the gap is actually a little later than the logged errors... seeing it span 0915-0925 roughly while we were seeing deletion and ssh errors in the nodepool log an hour prior	15:34
jeblair	ok, so i think i want to do the following: add a thread-dump handler to nodepool like zuul has	15:35
jeblair	consider using dequeue instead of queue	15:35
jeblair	i think the immediate cause of this may be a mystery for now	15:35
jeblair	but if it happens again, hopefully the thread dump handler will help	15:36
fungi	at least we know where to focus debugging the next time this happens, and possibly minimize the disruption as well	15:36
*** markmc has joined #openstack-infra		15:36
jeblair	yeah. my thinking is that it's either a thread-related bug (which is really weird because that's hard to imagine except for a bug in the stdlib)	15:36
jeblair	or it could be a novaclient bug, where all of the novaclient client objects are stuck doing something	15:37
jeblair	(which may have been triggered by the host/network weirdness)	15:37
jeblair	so, for cleanup:	15:37
fungi	and then the manual cleanup for now is, what, shut down nodepool, run a delete query to remove any machines in a building or delete state manually and nova delete any of the failed deletes themselves, then start nodepool again?	15:38
jeblair	fungi: close	15:38
jeblair	fungi: i'd get the list of machines we want to delete from nodepool, restart it, then 'nodepool delete' each of them	15:38
fungi	oh, that's nicer	15:39
jeblair	fungi: (nodepool should be capable of deleting anything it has a record for)	15:39
fungi	will nodepool delete work on aliens too?	15:39
fungi	i guess not, since no record	15:39
jeblair	fungi: then we can also use nodepool alien-list to get the others, and unfortunately no, we'll have to nova delete those	15:39
fungi	that's easy enough	15:39
fungi	okay, i can tackle that while you get to breakfast	15:40
jeblair	fungi: how did you know? :)	15:40
jeblair	fungi: nodepool list \|grep building\|awk '{print $2}'	15:40
fungi	heh	15:40
jeblair	fungi: is very handy	15:40
jeblair	fungi: nodepool list \|grep building\|awk '{print "nodepool delete " $2}'	15:40
jeblair	fungi: actually that's even handier	15:40
fungi	yup. i was using cut to a similar effect, but maybe a slightly more machine-parsable format option would be a nice furture addition	15:41
fungi	s/furture/future/	15:41
jeblair	fungi: i'd recommend taking that list and splitting it into about 5 parts or so, and then background 5 scripts running through that	15:41
*** blamar has quit IRC		15:41
jeblair	fungi: to balance speed vs likelyhood of hitting an api rate limit	15:41
fungi	yeah, don't want to get throttled	15:41
fungi	right	15:41
*** blamar has joined #openstack-infra		15:42
fungi	okay, shutting down nodepool now and getting started on that unless you need anything else from the running process first	15:42
sdague	fungi: cookiecutter question ... why does this pass tests - https://github.com/sdague/os_loganalyze	15:42
sdague	I set up an assertTrue(False) in there to ensure it broke correctly, and no dice	15:42
jeblair	fungi: nope, go for it; i'd go ahead and restart nodepool immediately though so it can better keep up with the still-running check nodes	15:43
fungi	got it--will do jeblair	15:43
* ttx will get drunk now to forget he'll have to work over the weekend to catch up		15:44
*** sarob has joined #openstack-infra		15:45
*** amotoki has joined #openstack-infra		15:45
sdague	ttx: hopefully with some nice wine	15:46
*** ruhe has quit IRC		15:46
jeblair	fungi: i can start work on the alien deletions if you want	15:46
anteaya	ttx keep that Hawaiian shirt handy, you never know	15:46
fungi	jeblair: sure, i've got the building deletions going now	15:47
*** Steely_Dan is now known as Steely_Spam		15:48
jeblair	fungi: cool i'm on it then	15:48
fungi	5 separate batches of ~50 each	15:48
pabelanger	So, should I expect tox to run properly after I use cookiecutter of the first time?	15:48
fungi	pabelanger: you and sdague seem to possibly be asking the same question	15:48
* ttx will bbl		15:48
pabelanger	fungi, okay cool	15:49
pabelanger	I think it missed setting up versioning	15:49
pabelanger	for defaulting to something	15:49
fungi	if you don't figure it out among yourselves shortly, i'll have a look once i wrap up the current firefight	15:49
sdague	fungi: so my issue is actually subunit discover doesn't seem to find any tests	15:50
sdague	and "passes" because of it	15:50
*** alcabrera has joined #openstack-infra		15:50
fungi	sdague: hrm, maybe the search path in the tox.ini is too strict?	15:50
sdague	I don't think so	15:51
sdague	if I manually venv, and run	15:51
sdague	./bin/python -m subunit.run discover -t ./ . --list	15:51
sdague	nothing	15:51
fungi	the zuul test nodes status graph seems to reflect things are on their way to recovery	15:51
fungi	and i do see some jobs going in the gate queue now	15:51
*** sandywalsh has quit IRC		15:53
sdague	ok, off to lunch	15:53
*** sandywalsh has joined #openstack-infra		15:53
*** sandywalsh has quit IRC		15:54
mordred	sdague, what's going on with discover?	15:54
mordred	and I see code?	15:54
fungi	mordred: his repo is https://github.com/sdague/os_loganalyze	15:54
jeblair	fungi: aliens deleted	15:54
fungi	jeblair: thanks!	15:54
fungi	the building ones are deleted now too	15:54
*** cody-somerville has joined #openstack-infra		15:55
mordred	sdague: os_loganalyze/tests/ is missing a __init__.pyfile	15:55
jeblair	fungi: you probably want to 'nodepool delete' the ones in delete state as well, to speed things up	15:56
fungi	jeblair: or at least they should be deleted but i see "building" state nodes in the nodepool list output with an age >7 hours still	15:56
jeblair	fungi: hrm	15:56
openstackgerrit	Monty Taylor proposed a change to openstack-dev/cookiecutter: Actually git add the __init__.py file https://review.openstack.org/51238	15:57
*** sandywalsh has joined #openstack-infra		15:57
mordred	sdague, fungi ^^	15:57
pabelanger	http://pastebin.com/YeMM1kiK	15:57
jeblair	fungi: i just deleted one and it went away	15:57
pabelanger	that's my error for cookiecutter	15:57
*** sandywalsh has quit IRC		15:57
fungi	jeblair: nevermind--i think at least one of my delete jobs was hung in the background for a moment	15:57
mordred	pabelanger: ah! so, you need to make it a git repo and actually commit the first commit before that will work	15:58
pabelanger	mordred, Ah, I see	15:58
pabelanger	okay	15:58
*** sandywalsh has joined #openstack-infra		15:58
mordred	sorry, I keep meaning to hack something in so that it will a) do that for you or b) print a warning	15:58
mordred	pabelanger: also - see the note to sdague above	15:58
pabelanger	mordred, roger	15:58
mordred	pabelanger: I forgot to git add a file :)	15:58
fungi	jeblair: or not. the background jobs did all finish like i thought, i'm just getting output from nodepool on my terminal after restarting it. may not properly close its file descriptors?	15:58
jeblair	fungi: never seen that	15:59
jeblair	fungi: ps suggests there's at least one delete script going	15:59
*** bnemec has joined #openstack-infra		15:59
fungi	huh. jobs does not list it	15:59
fungi	oh, yes it does actually	16:00
fungi	okay, so it's still churning apparently. may have gotten throttled after all	16:00
fungi	it went silent for several minutes there	16:01
jeblair	ah	16:01
fungi	now it's done	16:02
annegentle_	way to go looks like you turned a corner! http://bit.ly/1afAl8w	16:02
jeblair	fungi: btw, errors about '2249297' are my fault, btw	16:02
fungi	and yes, no building nodes older than 15 minutes now	16:02
annegentle_	some days I just want to be cheerleader bystander but deadlines keep getting in the way	16:02
fungi	jeblair: okay, noted	16:02
jeblair	fungi: i accidentally nova deleted it, but it really was building;	16:02
fungi	k	16:02
mordred	annegentle_: that's a sexy graph!	16:03
fungi	i've got a round of 5 parallel scripts deleting the "delete" state nodes now	16:04
clarkb	morning	16:06
jeblair	clarkb: impeccable timing! :)	16:07
pabelanger	okay cool	16:07
clarkb	jeblair: looks like it	16:07
pabelanger	flake8 a little sad	16:07
pabelanger	but that is okay for now	16:07
clarkb	bauzas: are you running tox with -r locally and are you running that in a clean git checkout?	16:08
*** SergeyLukjanov is now known as _SergeyLukjanov		16:10
*** _SergeyLukjanov has quit IRC		16:10
clarkb	jeblair: fungi: so nodepool was having a hard time with the hpcloud endpoints?	16:11
clarkb	but is all better now?	16:11
fungi	clarkb: that was my earlier theory, but no longer suspect that to be the case	16:11
fungi	jeblair did some investigation in a debugger, found a possible deadlock but without a thread dump it was hard to pinpoint the contention	16:12
clarkb	I see. Does nodepool need to the zuul threaddump signal catcher?	16:13
fungi	basically, that was his suggestion	16:13
clarkb	that should be easy to port over. I can poke at it later	16:13
jeblair	clarkb: i've got it -- almost done	16:15
clarkb	cool	16:16
*** matty_dubs is now known as matty_dubs\|lunch		16:16
pabelanger	mordred, looks like something in the import process is messing up with flake8	16:16
pabelanger	http://pastebin.com/xxZqgJ0T	16:16
pabelanger	http://pastebin.com/PVB4wDQp	16:17
clarkb	pabelanger: the no newline at end of file? I think mordred filed a bug against upstream about that	16:17
pabelanger	okay cool	16:17
pabelanger	that was about the only other thing I see about tox being unhappy\	16:18
mordred	yup. I have an upstream PR up	16:18
clarkb	jeblair: I think https://review.openstack.org/#/c/42393/ can probably be approved if you are happy with it	16:19
clarkb	mordred: https://review.openstack.org/#/c/33926/ I haven't approved that becuase I don't have time to babysit (eg check results of change before next gerrit restart)	16:19
clarkb	mordred: but if you do, feel free to approve	16:19
clarkb	jeblair: https://review.openstack.org/#/c/45294/ has comments for you as well	16:20
mordred	clarkb: same here	16:21
pabelanger	jeblair, hope to get back into nodepool reviews today	16:21
mordred	clarkb: I think when I approve that, I'll run a manual puppet agent --test on review.o.o and watch the patch output (should be null-ish)	16:21
*** bnemec is now known as beekneemech		16:22
*** odyssey4me2 has joined #openstack-infra		16:22
* fungi finally had a moment to put on his "i voted" sticker		16:23
*** odyssey4me is now known as Guest36051		16:23
*** odyssey4me2 is now known as odyssey4me		16:23
mordred	clarkb: https://review.openstack.org/#/c/48355/ could use a look from you or fungi - you guys had great comments last time	16:25
*** markmc has quit IRC		16:25
*** _david_ has joined #openstack-infra		16:27
_david_	zaro, ping	16:27
clarkb	_david_: zaro isn't around this week	16:28
_david_	clarkb, thx, i fixed his patch, and wanted to ask if he can test it?	16:28
_david_	https://gerrit-review.googlesource.com/#/c/48254/	16:29
*** enikanorov-w has quit IRC		16:29
_david_	clarkb, i tested upgrade to Gerrit schema 85 and now a permission can be granted to new system group "Change Owner".	16:30
*** blamar has quit IRC		16:30
_david_	jeblair, mordred, clarkb we host wip-plugin on gerit-review	16:31
_david_	git clone https://gerrit.googlesource.com/plugins/wip	16:31
*** markmcclain has joined #openstack-infra		16:31
*** derekh has quit IRC		16:32
clarkb	cool	16:32
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Add a thread dump signal handler https://review.openstack.org/51248	16:33
anteaya	fungi: yay!	16:34
*** anteaya has quit IRC		16:34
jeblair	clarkb, fungi: fungi may still be right -- it's possible that it had a hard time with the hpcloud endpoints which caused a bug. i don't know if it was a deadlock or not; it's really hard to say.	16:34
jeblair	clarkb, fungi: it's entirely possible that we were just sitting in a novaclient call, forever.	16:35
fungi	#status ok the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue	16:36
openstackstatus	NOTICE: the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue	16:36
*** ChanServ changes topic to "Discussion of OpenStack Project Infrastructure \| Docs http://ci.openstack.org/ \| Bugs https://launchpad.net/openstack-ci \| Code https://git.openstack.org/cgit/openstack-infra/"		16:36
*** mrodden has quit IRC		16:36
jeblair	mordred, clarkb: https://review.openstack.org/#/c/45294/	16:36
jeblair	mordred, clarkb: i don't care how stackforge projects do project management	16:37
*** dkehn_ has joined #openstack-infra		16:37
jeblair	mordred: but i do care that if people have tag permissions and don't know how to use them, then we get called in to clean it up, which is complicated, takes time, and is not scalable	16:37
*** jpich has quit IRC		16:38
*** dkehn has quit IRC		16:38
jeblair	mordred: so i think the best compromise between full access and no access to push tags, is that we ask that they limit the group of people who can tag to a small set who fully understand the process	16:38
jeblair	mordred: is that unreasonable?	16:38
mordred	jeblair: I don't think it's unreasonable, - but in this case they're asking for the group to match the group that has tag access for libra itself	16:41
mordred	jeblair: so, effectively, I believe it is the thing you are asking for, AIUI	16:41
mordred	LinuxJedi: ^^ right? do I grok?	16:41
jeblair	mordred: that's fine then.	16:42
*** thomasbiege has quit IRC		16:42
LinuxJedi	mordred: yep	16:42
LinuxJedi	mordred: which is really small anyone, and only the people that do it now	16:42
mordred	excellent	16:42
*** _david_ has quit IRC		16:42
jeblair	LinuxJedi: the think i wanted to ensure is that it's a small group that understands the process/dangers. sounds like that's the case. thanks.	16:43
*** _david_ has joined #openstack-infra		16:43
LinuxJedi	jeblair: oh hell yes. That groups is only me Shrews, marcp and pcrews. We are the only ones that would do tagging	16:44
jeblair	i have +2d	16:44
Shrews	LinuxJedi: Actually, only you and I are part of the -milestone group	16:44
LinuxJedi	even better	16:44
_david_	clarkb, can you ask zaro to test that patch?	16:45
_david_	because Gerrit maintainer would like to cut stable-2.8	16:45
clarkb	_david_: yes, I will let him know when he is back	16:45
_david_	clarkb, weekend?	16:45
clarkb	_david_: oh, well he is AFK until monday iirc	16:45
*** thomasbiege has joined #openstack-infra		16:46
LinuxJedi	mordred: maybe a future release for gerrit/git-review should be to have a code review system for tags if there are worries.	16:46
clarkb	LinuxJedi: yes! that would be awesome	16:46
*** mrodden has joined #openstack-infra		16:47
*** hogepodge has joined #openstack-infra		16:47
*** dkehn has joined #openstack-infra		16:52
*** _david_ has left #openstack-infra		16:52
*** dkehn_ has quit IRC		16:54
*** matty_dubs\|lunch is now known as matty_dubs		16:54
openstackgerrit	A change was merged to openstack-infra/config: Remove tuskarclient pylint job. https://review.openstack.org/49965	16:56
*** Ryan_Lane has quit IRC		16:58
*** dkehn_ has joined #openstack-infra		17:01
*** dkehn has quit IRC		17:02
mordred	LinuxJedi: yes. we're actually planning that ish	17:03
mordred	LinuxJedi: or, a tool that lets you do "please make a new minor release for me"	17:03
*** rahmu has joined #openstack-infra		17:03
mordred	LinuxJedi: so it knows how to find your current version, logically increment the thing you asked it to, run the tag command with -s, etc	17:04
jeblair	mordred: a pbr function that implements 'python setup.py release' ?	17:18
mordred	jeblair: yah. something that like	17:19
mordred	something like that	17:20
mordred	although I was considering making it two commands or splittable - so you could do the local tagging separate from pushing the local tag	17:20
jeblair	mordred: i think that's a good idea	17:21
mordred	(I usually do the tag and then do an sdist to check that it worked and stuff)	17:21
*** hemnafk is now known as hemna		17:22
*** osanchez has quit IRC		17:27
sdague	mordred: woot	17:28
sdake	jeblair re our conversation at cloudopen regarding using heat to run the gate jobs, is zuul the software that does all that?	17:31
*** Ryan_Lane has joined #openstack-infra		17:36
fungi	sdake: zuul coordinates and acts as a scheduler, while jenkins handles the execution and artifact collection	17:36
fungi	at least presently	17:36
sdake	does jenkins execute some scripts to do the building of the vms?	17:36
fungi	sdake: nodepool (and in some cases humans) do that part	17:37
sdague	mordred: so curiously cookiecutter seems to trim newlines at the end of files, so it un pep8's our template	17:37
*** Ryan_Lane has quit IRC		17:38
fungi	sdague: specifically nodepool has some pool management heuristics including a semi-predictive evaluation of current demand and uses that to try and maintain sufficient levels of available virtual machines	17:38
*** arosen1 has joined #openstack-infra		17:39
fungi	as well as garbage-collecting the machines once they've been used	17:39
*** thomasbiege1 has joined #openstack-infra		17:39
sdake	fungi does it use bare metal nodes as the backend, or openstack instances?	17:40
*** arosen has quit IRC		17:40
mordred	sdake: that is correct. I have submitted a PR to upstream to fix it	17:40
fungi	sdake: it presently uses openstack/nova-based service providers who donate resources to us	17:40
fungi	sdake: though there is work underway to start testing tripleo on bare metal i think?	17:41
*** melwitt has joined #openstack-infra		17:41
fungi	and have nodepool coordinate to a nova-bm/ironic environment the tripleo peeps are maintaining	17:42
*** melwitt has quit IRC		17:42
fungi	though i've not been paying as close attention to that as i should, so i'm light on details there. lots of other stuff going on	17:42
sdake	fungi which part of nodepool does the orchestration of the vm?	17:42
*** reed has joined #openstack-infra		17:42
*** thomasbiege has quit IRC		17:43
fungi	sdake: it has an image builder which calls into the template vm and runs some shell scripts and puppet to get it into a desired state, then shuts it down and uses it to clone others	17:43
hogepodge	pebelanger clarkb Do you know of anyone with free cycles to finish the review of https://review.openstack.org/#/c/49020/ ?	17:43
*** SergeyLukjanov has joined #openstack-infra		17:43
fungi	sdake: and refreshes that daily	17:43
*** melwitt has joined #openstack-infra		17:44
clarkb	hogepodge: fungi maybe?	17:44
fungi	if by orchestration you mean setup, and not the running of the tests/jobs on a particular vm	17:44
sdake	so when zuul says "hey I have another job for you" how does that get launched?	17:44
fungi	sdake: zuul tells jenkins to run it and on which vm	17:44
*** Ryan_Lane has joined #openstack-infra		17:44
sdague	clarkb: ok, first unit tests into the htmlifier to shore up it's behavior, and I already found a bug :)	17:44
sdague	yay, tests	17:45
clarkb	woot	17:45
sdake	fungi so jenkins logs into the box and does some ssh commands or something?	17:45
fungi	sdake: zuul knows a list of the jobs and under what circumstances they should be run and which systems can run them, and jenkins has details on what each actual job does	17:45
*** boris-42 has joined #openstack-infra		17:45
fungi	sdake: a jenkins master can use multiple means of controlling its slaves, but we rely on ssh	17:46
fungi	sdake: jenkins also has a java-based agent which runs on each slave and communicates state with the master	17:46
sdake	cool let my brain cook on that for awhile	17:46
sdake	thanks for the info fungi	17:46
fungi	sdake: you're welcome. these are also covered with pretty diagrams and examples in a couple of brief slide presentations published at http://docs.openstack.org/infra/publications/	17:47
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Add a thread dump signal handler https://review.openstack.org/51248	17:48
sdake_	thanks bookmarked for later	17:48
*** jerryz has joined #openstack-infra		17:49
*** rickerc has quit IRC		17:49
*** thomasbiege1 has quit IRC		17:50
*** dkehn_ is now known as dkehn		17:51
*** amotoki has quit IRC		17:51
*** alcabrera has quit IRC		17:52
*** thomasbiege has joined #openstack-infra		17:53
fungi	hogepodge: left a comment on it. i think you got some of your cleanup backwards in the new patchset	17:53
hogepodge	fungi: I think I did too.	17:54
hogepodge	fungi: :-)	17:54
hogepodge	fungi: This is why I love gerrit	17:54
hogepodge	fungi: Thanks.	17:54
*** thomasbiege has quit IRC		17:55
fungi	my pleasure	17:55
*** rickerc has joined #openstack-infra		17:55
*** odyssey4me has quit IRC		17:56
jerryz	fungi: could you tell me which dns server is used for devstack gate slaves?	17:58
*** moted has quit IRC		17:58
*** nati_ueno has joined #openstack-infra		17:59
jerryz	fungi: sometimes i hit this bug/question https://bugs.launchpad.net/devstack/+bug/1190844	17:59
uvirtbot	Launchpad bug 1190844 in devstack "./stack.sh is resulting any error "/opt/stack/devstack/functions: line 1228: : No such file or directory" on stable/grizzly branch" [Undecided,Invalid]	17:59
fungi	jerryz: depends on the provider i think, but i'll check	17:59
reed	hello folks	18:00
fungi	hello reed	18:00
*** johnthetubaguy has quit IRC		18:00
fungi	jerryz: it may even vary by region/availability zone... in rackspace dfw we use 72.3.128.240 and 72.3.128.241	18:01
jerryz	fungi: thanks. i think the dns i use which is 8.8.8.8 give me the wrong IP for cdn.download.cirros-cloud.net	18:01
sdague	jerryz: the cdn for cirros got flakey some time yesterday	18:02
fungi	jerryz: ahh, yes there were some cdn issues for cirros image downloads which got worked through yesterday. are you still encountering it in current runs?	18:02
jerryz	fungi: for now, i just put the right ip in /etc/hosts	18:03
*** gyee has joined #openstack-infra		18:03
jerryz	fungi: the cdn chosen in seatle,WA works for me	18:04
fungi	okay, cool	18:04
*** dkehn has quit IRC		18:04
*** alcabrera has joined #openstack-infra		18:09
jerryz	fungi sdague: the slaves from hp or rackspace for jenkins.o.o also has that problem? i had thought the dns i used was not smart enough to refresh available cdn ip addresses.	18:10
*** pycabrera has joined #openstack-infra		18:13
*** zehicle_at_dell has joined #openstack-infra		18:15
*** esker has joined #openstack-infra		18:15
*** alcabrera has quit IRC		18:16
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Rename ASRT -> AGT https://review.openstack.org/51267	18:20
jeblair	clarkb: ^ to try to make the debug log from nodepool more clear	18:20
*** dizquierdo has left #openstack-infra		18:22
notmyname	jeblair: I see https://github.com/openstack/swift-bench exists now. does that mean we're good to go? well, after I commit a .gitreview doc	18:26
jeblair	notmyname: yes, i think it merged last night sometime	18:26
jeblair	notmyname: hold on that...	18:27
notmyname	[gerrit]	18:27
notmyname	host=review.openstack.org	18:27
notmyname	port=29418	18:27
notmyname	project=openstack/swift-bench.git	18:27
notmyname	jeblair: proposed .gitreview ^^	18:27
*** CaptTofu has quit IRC		18:27
jeblair	fungi, clarkb, mordred: http://git.openstack.org/cgit/openstack/swift-bench/tree/	18:27
jeblair	looks empty	18:27
*** CaptTofu has joined #openstack-infra		18:28
notmyname	jeblair: empty? I see stuff	18:28
jeblair	notmyname: the joys of load balancing; it's empty on git03.o.o	18:29
notmyname	jeblair: so should I push a change or not?	18:30
notmyname	jeblair: assuming that proposed .gitreview is good	18:31
jeblair	notmyname: i think you're good. since you don't have any jobs yet, nothing automated is going to try to hit git.o.o. i'll fix git03 shortly.	18:32
*** CaptTofu has quit IRC		18:32
notmyname	jeblair: great	18:32
jeblair	notmyname: (earlier i was worried it was a sign something more serious broke)	18:32
*** mestery has joined #openstack-infra		18:32
notmyname	jeblair: https://review.openstack.org/#/c/51268/	18:32
notmyname	jeblair: if you can give me your +1 there, I'll merge it and we should be off to the races	18:33
jeblair	notmyname: done	18:34
*** melwitt1 has joined #openstack-infra		18:34
notmyname	jeblair: thanks for your help	18:35
*** dafter has quit IRC		18:35
jeblair	notmyname: no prob!	18:35
*** itchsn has joined #openstack-infra		18:36
*** melwitt has quit IRC		18:37
*** dcramer_ has quit IRC		18:37
*** sarob has quit IRC		18:38
*** itchsn has quit IRC		18:38
*** CaptTofu has joined #openstack-infra		18:38
*** dkehn has joined #openstack-infra		18:39
jeblair	clarkb, mordred, fungi: ok, the swift-bench thing on git03 was just the replication race condition; i replicated again and it's updated. i'm looking forward to having salt do this. :)	18:41
jeblair	notmyname: ^ all the git.o.o servers have swift-bench now	18:42
notmyname	yay	18:42
*** dafter has joined #openstack-infra		18:44
*** alexpilotti_ has joined #openstack-infra		18:45
ttx	jeblair: hey, nice work on unbreaking the gate! What caused the initial fail ?	18:47
*** alexpilotti has quit IRC		18:47
*** alexpilotti_ is now known as alexpilotti		18:47
ttx	(if we know that)	18:48
*** dcramer_ has joined #openstack-infra		18:53
mordred	jeblair: ++ salt	18:55
*** Bada has joined #openstack-infra		19:00
dkranz	There seems to be a problem with https://review.openstack.org/#/c/50795/ merging	19:05
dkranz	jenkins reported success an hour ago but zuul shows some of the jobs as "queued". That is strange.	19:06
*** dcramer_ has quit IRC		19:06
*** melwitt has joined #openstack-infra		19:06
*** melwitt1 has quit IRC		19:06
clarkb	dkranz: the +1 verified is for your recheck. still waiting on gate tests	19:09
dkranz	clarkb: ok, thanks. Guess things are really slow	19:09
hub_cap	mordred: whats the status on the work we talked about in seattle? the images stuff.. im at a point where i can take any/all of it on	19:16
*** dhouck_ has quit IRC		19:16
*** jog0 is now known as flashgordon		19:17
hub_cap	clarkb: ^ ^	19:17
hub_cap	flashgordon: silly handle friday?	19:17
flashgordon	hub_cap: casual nick friday	19:18
flashgordon	most of the nova folk do it	19:19
hub_cap	oh yes im aware :)	19:19
*** dcramer_ has joined #openstack-infra		19:20
* fungi thinks every day is casual nick friday (and hawaiian shirt tuesday)		19:20
mordred	hub_cap: it's - uhm.	19:20
hub_cap	i thought so :)	19:20
mordred	we need to add a thing to the d-g caching scripts to download the images and cache them	19:21
mordred	then you're good to go	19:21
*** alexpilotti has quit IRC		19:21
hub_cap	like i said, i can help w any of it :) is someone working on the d-g caching script stuff?	19:21
mordred	nope	19:24
*** dprince has quit IRC		19:24
hub_cap	mind if i take a stab @it?	19:25
mordred	hub_cap: please do! https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/scripts/devstack-cache.py	19:25
mordred	hub_cap: is the script you want to look at	19:25
hub_cap	<3	19:25
mordred	it currently has a place where it pre-downloads images referenced by devstack	19:25
hub_cap	cool ill peep it and ask questions :)	19:26
mordred	hub_cap: steps forward would be either just add direct curl commands to download the images	19:26
mordred	hub_cap: OR - you could get fancy and read image elemens	19:26
mordred	elements	19:26
mordred	in https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/scripts/prepare_devstack.sh	19:26
mordred	you'll see where we pre-clone a bunch of git repos	19:26
mordred	you could add needed repos there, and then read files in them to find out what images they want to download	19:27
mordred	depends on how clever you want to be	19:27
hub_cap	ya i was wondering about that.. are we wanting to test every flavor of image elements? or just the "supported" ones from openstack perspective	19:27
hub_cap	for trove, i dont think i need to test fedora/centos, i can test centos and call it a day. but thats my perspective..	19:28
hub_cap	do we have a list of official supported linux flavors?	19:29
mordred	from a d-g perspective	19:29
mordred	we want to precache things that jobs that run on the nodes might want to download	19:29
mordred	(this is why we go through and pre-download all of the debs that devstack _might_ wind up installing, but not install them)	19:29
mordred	but "might"	19:29
*** mriedem has quit IRC		19:30
mordred	is as defined by the set of things actually referenced in elements in repos that we might actually run	19:30
mordred	clarkb, fungi, jeblair: did you see [openstack-dev] "Thanks for fixing my patch" ?	19:30
*** pycabrera is now known as alcabrera		19:31
mordred	seems like a policy amendment that might apply nicely for us too	19:31
*** davidhadas has joined #openstack-infra		19:31
hub_cap	mordred: that makes sense but if someone busts out a scientific linux element, do we want to download/cache that?	19:32
hub_cap	oh and imma use the image elements, just fyi, cuz they have a nice little set of image url details i dont care to duplicate	19:33
clarkb	mordred I did	19:33
clarkb	mordred I think we basically do that already but only when in a time crunch	19:34
hub_cap	currently there are fedora/centos/ubuntu, i can cache all 3 if we think we _may_ need to test on them all	19:34
clarkb	we could shift to being proactive about it	19:34
fungi	mordred: in fact, i do try to do that when i'm in a situation to do so (availability and knowledge-wise)	19:36
fungi	i think it's a great idea	19:36
fungi	i assumed it was already an accepted workflow among our team	19:36
ttx	fungi: did you guys get to the bottom of today's issue, root cause ?	19:36
fungi	ttx: no, we narrowed it down but there were insufficient debugging capabilities available, so jeblair has added those to the daemon for "next time"	19:37
ttx	fungi: ack	19:38
ttx	aqt least whatever it was that caused it, it's gone now	19:38
fungi	well, whatever caused it got it into a perpetual state which was cleared through a restart, but next time we can generate a thread dump and restart it right away, then have the luxury of debugging while things don't remain indefinitely unusable	19:39
ttx	cinder rc2 on its way, hold to your seats	19:44
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Change test_queries from logical AND to OR https://review.openstack.org/50160	19:46
*** arosen1 has quit IRC		19:47
*** arosen has joined #openstack-infra		19:48
clarkb	mordred: do you have any more ideas on openstack_citest mysql perms? I think granting create and drop globally is necessary	19:53
mordred	clarkb: I believe you are correct	19:59
*** zehicle_at_dell has quit IRC		19:59
mordred	clarkb: otherwise, we could use mysql sandbox to spin up per-testrun mysqls and tear them down afterwards...	19:59
* mordred hides		20:00
flashgordon	if anyone is looking for reviews to do, hacking has some reviews that need some attention https://review.openstack.org/#/q/status:open+project:openstack-dev/hacking,n,z	20:01
fungi	mordred: isn't that basically what ceilo's mongodb functional tests use?	20:02
jeblair	fungi, mordred, clarkb: yeah, i believe that has been accepted around infra repos, and i would expect it to be considered on-form in openstack repos too	20:02
jeblair	fungi, mordred, clarkb: one good reason not to do that for many patches in infra is to help people learn about our systems --	20:03
flashgordon	clarkb: btw you are the most active reviewer in all of openstack	20:03
flashgordon	http://russellbryant.net/openstack-stats/all-reviewers-180.txt	20:03
jeblair	a lot of folks come in and say "i want to try to figure this out", and you know, teaching to fish and all.	20:03
fungi	agreed. best reserved for urgent issues. it's not like we have tons of time to spare fixing up non-urgent changes	20:05
ttx	cinder rc2 out	20:05
* ttx calls it a day		20:05
jeblair	clarkb: congrats!	20:05
mordred	clarkb: w00t!	20:06
mordred	wow. I'm 14th	20:06
clarkb	flashgordon: I saw that and was a little surprised. some of that is from mass rechecks though	20:06
mordred	clarkb: ssh	20:06
clarkb	:)	20:06
jeblair	clarkb: you leave votes with rechecks?	20:06
clarkb	jeblair no	20:06
mordred	clarkb: actually, that's only tracking votes	20:07
jeblair	clarkb: then... no? :)	20:07
clarkb	is thar only votes o_O	20:07
clarkb	wow	20:07
flashgordon	clarkb: most are in infra it looks like	20:07
lifeless	wow, I'm up there	20:07
clarkb	my review queue is huge. I try to stab at it as often as possible	20:07
lifeless	and wth is dripton ?	20:07
lifeless	http://russellbryant.net/openstack-stats/all-reviewers-30.txt	20:08
lifeless	11th, yay.	20:08
hub_cap	thats it, im +1'ing random shit for the next 30 days	20:08
lifeless	yeah, no.	20:09
hub_cap	hahaha	20:09
hub_cap	lifeless: great work, +1.. everything	20:09
jeblair	hub_cap: that'll show up in the +/- column	20:09
hub_cap	i know ill be a +1 baller jerryz	20:09
hub_cap	*jeblair	20:09
hub_cap	tab-fail	20:09
sdague	hub_cap: yeh, that's why the % pos and conflict columns are there to try to sanity check things	20:10
sdague	if you see 90+% pos, the person has missed the point of reviewing	20:10
jeblair	for quite some time, clarkb and i have both maintained an 80% average	20:10
hub_cap	66%.. thats at least 20% worth of system-gaming i can do... numbers make u look good right??	20:11
hub_cap	;)	20:11
flashgordon	sdague: looks like i am missing the point http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt	20:12
mordred	flashgordon: me too	20:12
jeblair	hub_cap: yeah, i think you could be 20% nicer, but you wouldn't be you.	20:12
flashgordon	lifeless: ^^	20:12
hub_cap	jeblair: TRU	20:12
mordred	although part of my problem is that when I'm -1 I tend to poke someone in IRC to ask/chat about it	20:12
mordred	I really need to leave that in the system more	20:13
hub_cap	mordred: i stopped doing that	20:13
hub_cap	i was super low on tracked reviews	20:13
jeblair	mordred: i do both	20:13
mordred	jeblair: I need to do both	20:13
hub_cap	i looked like a schmuck (well more of one than normal)	20:13
hub_cap	yeah ptl of trove has 2 reviews in the past 30 days	20:13
mordred	jeblair: can we set up a bot that will let me comment on gerrit thigns?	20:13
mordred	jeblair: so I can say #bot -1 13415 I have issues with this	20:13
mordred	?	20:13
hub_cap	emacs has a command for that mordred	20:14
mordred	hub_cap: good point	20:14
jeblair	mordred: yeah, what could go wrong with giving an irc bot super-super-admin access in gerrit?	20:14
lifeless	flashgordon: ?	20:14
mordred	jeblair: can't see anything wrong with that	20:14
fungi	how did i get to #6? i feel perpetually behind on reviews :/	20:15
jeblair	mordred: you're now at 92% positive reviews! ;)	20:15
mordred	flashgordon: I do find it interesting that my 30 day percentage is about == to my 180 day	20:15
mordred	jeblair: I am?	20:15
*** adalbas has quit IRC		20:15
mordred	oh - just in infra?	20:15
fungi	ugh, i'm also by far the most "positive" reviewer in the top 10 besides mordred	20:15
jeblair	mordred: sorry, EJOKE	20:15
flashgordon	lifeless: my review stats are not good http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt	20:15
lifeless	flashgordon: you are fairly positive	20:15
lifeless	flashgordon: OTOH you've been reviewing code mainly written by experienced core folk	20:16
jeblair	mordred: er, the idea was that you gave a cursory +1 to the idea of giving an irc bot super-super-admin access to gerrit.	20:16
flashgordon	lifeless: I have been doing a a lot of -0s and not -1	20:16
fungi	mordred: we need to be more curmudgeoney apparently	20:16
flashgordon	lifeless: yeah	20:16
lifeless	flashgordon: I don't think that counts against you; in fact I'd kindof like to see a partitioned metric	20:16
lifeless	reviews vs core	20:16
lifeless	reviews vs noncore	20:17
hub_cap	fungi: maybe yall just do such good work that theres nothing to -1 and this system doesnt accurately track that	20:17
lifeless	I suspect it would be interesting	20:17
lifeless	rustlebee: ^ But I have no plans to implement just yet :P	20:17
flashgordon	lifeless: it would be interesting	20:17
rustlebee	track all the things	20:18
* mordred needs to go back to reviewing first thing in the morning		20:19
mordred	and clearing the entire outstanding queue down	20:19
clarkb	mordred: I can't do that because by 9:30 am PST all the fun stuff is happening	20:20
clarkb	I have found post dinner to be good for reviews	20:20
mordred	clarkb: wake up at 6:30am PST like fungi and I!	20:21
fungi	i also think one of the things which keeps my review average on the positive side, besides addressing concerns via irc only (which i should definitely stop doing) is not leaving a negative vote if someone else already has unless it's for a different issue, even if i reviewed the current state of the patch	20:21
mordred	yah	20:21
fungi	i should probably just get in the habit of it, and not worry so much about people potentially getting offended by negative score dogpiling	20:22
lifeless	fungi: I usually do what you describe there	20:23
mordred	clarkb: btw - you're welcome for the email I just sent ;)	20:23
lifeless	fungi: often if a patch has a -1 already, I won't even review it	20:23
lifeless	other than a cursory check to see if the submitter replied saying 'no, I disagree'	20:23
mordred	for the folks here who are not HP employees (shocking) I just sent an email to the internal openstack interest mailing list with the subject "Clark Boylan is the most active reviewer in all of OpenStack for Icehouse"	20:23
mordred	lifeless: I actually have a search filter that keeps me from seeing things with a -1	20:24
mordred	but largely that's because if jeblair or clarkb or fungi have -1'd something, it's pretty darned solid	20:25
*** yolanda has quit IRC		20:25
*** sandywalsh has quit IRC		20:26
fungi	especially if i -1'd it... must have been written in go or something	20:27
*** weshay has quit IRC		20:27
mordred	rustlebee: wow. your -2 count is so high!	20:27
rustlebee	feature freeze did it probably	20:28
sdague	mordred: feature freeze does that	20:28
mordred	ah	20:28
rustlebee	but i do tend to have a higher -2 count than most anyway :)	20:28
jeblair	so it looks like today is exercising the nodepool burst code	20:28
rustlebee	i love saying "NO!"	20:28
mordred	neat	20:28
sdague	you would not believe the crazy that people push after FF :) lots of people don't pay attention to the calendar	20:28
mordred	we should have a "Block" button which isn't tied to code review	20:29
fungi	or to the mailing list or to irc or to other people's comments on their other reviews or	20:29
jeblair	if you look at the nodepool graph, the top of the green line isn't flat anymore; i think whenever that's the case, and the green line is above its normal level, it's bursting due to demand from gearman	20:29
jeblair	(if it's not flat and it's below the normal level, we're hitting max capacity)	20:30
mordred	jeblair: oh neat!	20:30
lifeless	oh btw infra people	20:30
lifeless	tripleo now has an externally accessible trunk deployed kvm cloud	20:31
lifeless	updates every 40m +-	20:31
mordred	lifeless: when you say updates - you mean goes away and comes back, yeah?	20:31
lifeless	erm, trunk of OpenStack's API services etc, for clarity (it's not /just/ trunk of tripleo's code:)	20:32
lifeless	mordred: yes, making it preserve vm's is the next MVP	20:32
mordred	neat!	20:32
lifeless	mordred: and after that having it not interrupt shit	20:32
lifeless	right now hiera has credentials for infra on the grizzly kvm cloud	20:32
lifeless	which should be very reliable as it's entirely static	20:32
jeblair	lifeless: i'm excited about all of that	20:33
lifeless	this is just a headsup on where the next thing is at	20:33
* mordred can't wait until we add some nodepool load to your CD cloud so we can watch you update under piles of load		20:33
jeblair	++	20:33
lifeless	yay :)	20:34
lifeless	jeblair: I believe there is a nodepool bug preventing the tripleo experimental job being enabled? Can we help with that?	20:34
jeblair	lifeless: i think we have the nodepool changes in place to improve our chances of using the grizzly cloud without blowing everything up.	20:34
jeblair	lifeless: i'm not sure all of them are in the running nodepool yet	20:35
jeblair	lifeless: but perhaps this weekend we can restart nodepool and put that in again	20:35
jeblair	since we had a fire this morning, i want to take it easy for a while to give things a chance to catch up and hopefully minimize impact to the release process	20:36
mordred	++	20:36
*** prad_ has quit IRC		20:36
lifeless	jeblair: ack, thanks	20:37
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Rename ASRT -> AGT https://review.openstack.org/51267	20:37
sdague	clarkb: you about?	20:37
sdague	I wanted to get your take on the os_loganalyze tree to figure out what more I should do before we start connecting it up to the log server. Current code won't really change anything, but at least now I have a framework of test in place so I can figure out if I break something.	20:40
*** Bada has quit IRC		20:41
sdague	so I can feel confident in doing the keystone and swift log support	20:41
openstackgerrit	A change was merged to openstack-infra/gitdm: add user to openstack-config https://review.openstack.org/50425	20:41
mordred	wow. that's such a good commit message	20:41
*** dafter has quit IRC		20:41
mordred	the other reason my review % is so high is that I keep reviewing jeblair code.	20:42
jeblair	mordred: nice, now i can't say anything bad about your 90% average. :)	20:42
boris-42	jeblair hi	20:43
jeblair	boris-42: hello	20:43
clarkb	sdague: ish. train wifi/verizon not so good	20:43
boris-42	jeblair how are you?	20:43
mordred	jeblair: :)	20:44
jeblair	boris-42: i am well. how are you?	20:44
*** briancline has quit IRC		20:44
clarkb	sdague: I think starting with a 1:1 move is good then we can tack on bug fixes	20:44
mordred	jeblair: any reason I should not APRV a nodepool change? I kinda feel like you should handle landing those at the moment - am I being overly cautious?	20:44
*** tvb\|afk has joined #openstack-infra		20:44
boris-42	jeblair nice thanks. I would like to add benchmarking & profiling tool to OpenStack CI =) so probably you will be interested	20:45
*** ruhe has joined #openstack-infra		20:45
jeblair	mordred: nope -- as long as it doesn't require a coordinated config file change, should be safe. nodepool doesn't auto-restart, so it doesn't take effect until we restart it manually for some reason.	20:45
*** ruhe has quit IRC		20:46
jeblair	boris-42: yes, very much! do you think it would be a good idea to send an email to openstack-infra@lists.openstack.org to tell us a bit about the tool?	20:46
*** thomasm has quit IRC		20:46
boris-42	jeblair could we move to #openstack-rally	20:46
*** alcabrera has quit IRC		20:47
sdague	boris-42: I think it would be better here	20:47
sdague	having a million subchannels doesn't help keep folks on board	20:48
boris-42	sdague jeblair it's separated project … but ok	20:48
boris-42	sdague jeblair here is the wiki https://wiki.openstack.org/wiki/Rally	20:48
markmcclain	so looks like we hit the time limit on py26 neutron tests...	20:48
markmcclain	http://logs.openstack.org/08/50608/2/gate/gate-neutron-python26/de9ae8c/console.html	20:48
boris-42	sdague jeblair actually the official announce will at this Monday..	20:48
markmcclain	it actually succeed, but the gate failed since it ran over the hour	20:49
jeblair	markmcclain: do neutron unit tests really take twice as long as a full tempest run?	20:49
sdague	markmcclain: yeh, an hour is pretty long	20:49
openstackgerrit	A change was merged to openstack-infra/nodepool: Add a thread dump signal handler https://review.openstack.org/51248	20:49
markmcclain	I'm surprised by the runtime a bit	20:50
sdague	it's seemingly not doing anything for the first 15 minutes	20:50
sdague	figuring out why, would be helpful	20:50
sdague	jeblair: they did pass 40 mins on py26 during rc phase	20:50
clarkb	is git being slow again?	20:50
sdague	so if there was some new 15 min delay, I could see that smashing into 60	20:50
sdague	clarkb: I don't know	20:51
sdague	2013-10-11 18:44:23.275 \| Building remotely on centos6-6 in workspace /home/jenkins/workspace/gate-neutron-python26	20:51
sdague	2013-10-11 19:02:42.590 \| [gate-neutron-python26] $ /bin/bash -xe /tmp/hudson3680450177999363028.sh	20:51
clarkb	git seems fine. that delay at the beginning is weird though	20:52
jeblair	clarkb: hrm, the 15 min delay looks like it's from jenkins	20:52
clarkb	jeblair: ya	20:52
*** melwitt has quit IRC		20:53
hub_cap	so given that we want to cache the images for dib in the d-g jobs, its probably safe to assume we should run the entire 10-* script that does the work, ya? example: https://github.com/openstack/diskimage-builder/blob/master/elements/ubuntu/root.d/10-cache-ubuntu-tarball	20:54
jeblair	boris-42: ok, how can we help you?	20:54
boris-42	jeblair Rally is able to deploy cloud and test it, or just test it=)	20:54
hub_cap	otherwise if we put dib --offline, we will only have done 1/10'th of the work to make the image dib usable	20:55
jeblair	boris-42: which do you want to do first?	20:55
boris-42	jeblair to test it it requires only endpoints of cloud	20:55
hub_cap	what say you to that lifeless? (see my last 2 msgs)	20:55
boris-42	jeblair at the end I would like to deploy and test	20:55
jeblair	clarkb: the current job running on centos6-6 did not have a delay	20:55
clarkb	jeblair: could a job have taken over the node before eg bug in gearman plugins locking?	20:56
boris-42	jeblair Rally will support different deploy engines. (at moment only DevStack) but in future trippleO and fule	20:56
jeblair	boris-42: so we have added some hooks to the devstack-gate script that let you use a lot of the functionality in it	20:56
* fungi is popping out for an early dinner, but will return soon		20:56
openstackgerrit	A change was merged to openstack-infra/config: Fix sqlalchemy-migrate py26/sa07 job https://review.openstack.org/44686	20:56
clarkb	so 15 minutes of some other job running?	20:57
openstackgerrit	A change was merged to openstack-infra/config: Add tagging permissions to python-libraclient https://review.openstack.org/45294	20:57
boris-42	jeblair to test existing cloud I should have only Rally & cloud enpoints	20:57
jeblair	boris-42: so you should be able to write a job that runs rally on a cloud set up by devstack	20:57
jeblair	boris-42: or you can write a job like devstack-gate that uses rally to set up a cloud instead of devstack	20:57
*** senk has joined #openstack-infra		20:58
boris-42	jeblair interesting, ok I think it will be simpler for start to write just job that will run tests against your already deployed devstack cloud	20:58
sdague	anyone up for helping me get this tree into gerrit?	20:59
*** julim has quit IRC		20:59
jeblair	boris-42: ok. you can look at the swift-devstack-vm-functional jobs for an example of how to do something like that	21:00
boris-42	jeblair thank you!	21:01
boris-42	jeblair will try on next week=)	21:01
clarkb	sdague: I can try. reading ci.openstack.org/stackforge.html is a good place to start	21:01
*** melwitt has joined #openstack-infra		21:01
jeblair	clarkb: https://jenkins02.openstack.org/job/gate-nova-python26/6747/console https://jenkins02.openstack.org/job/gate-neutron-python26/2257/console https://jenkins02.openstack.org/job/gate-horizon-python26/1379/console	21:02
jeblair	clarkb: that's the job before, the neutron job, and the job after	21:02
jeblair	timestamps don't seem to overlap	21:02
jeblair	clarkb: and neither the job before or after did that	21:02
jeblair	i'm leaning toward 'jenkins got busy' or 'jenkins got semi-deadlocked' or 'jenkins garbage collected' or, well, in general, just blaming jenkins for being jenkins.	21:03
clarkb	jeblair: this is weird. ya jenkins for being jenkins seems plausible	21:03
*** jerryz has quit IRC		21:04
jeblair	markmcclain, sdague: so it looks like 15 min of that runtime is jenkins derping. let's call that a fluke for the moment, unless it happens with significant regularity.	21:04
sdague	jeblair: sounds fair	21:04
sdague	clarkb: ok, I'm assuming this will live in openstack-infra/ and will propose a patch accordingly	21:05
jeblair	sdague, clarkb: ++	21:05
*** senk has quit IRC		21:06
clarkb	sdague yup. the stackforge page is a decent template for what you need though	21:07
*** miqui has quit IRC		21:08
*** matty_dubs is now known as matty_dubs\|gone		21:09
sdague	what's included in python-jobs?	21:10
clarkb	pep8 pythonXX and pypy	21:10
clarkb	also gate-*-docs	21:10
clarkb	and coverage	21:10
*** lcestari has quit IRC		21:14
*** sarob has joined #openstack-infra		21:14
openstackgerrit	Sean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul https://review.openstack.org/51299	21:15
sdague	so, that about right?	21:15
*** CaptTofu has quit IRC		21:17
*** CaptTofu has joined #openstack-infra		21:17
clarkb	sdague the pep8 and python jobs are just gate-* no check-*	21:17
sdague	ok	21:18
sdague	let me fix that quick	21:18
openstackgerrit	Sean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul https://review.openstack.org/51299	21:18
*** anteaya has joined #openstack-infra		21:19
anteaya	clarkb: I am meeting all sorts of elastic search people	21:19
*** SergeyLukjanov has quit IRC		21:19
dkranz	clarkb: My tempest job watcher thinks only four tempest gate jobs have finished in the past few hours. Is it wrong or did I just pick a bad time to start with this?	21:19
anteaya	do you have a list of bugs or an etherpad that outlines your current pain points with logstash and elastic search so I can read up and ask intelligent questions	21:19
*** SergeyLukjanov has joined #openstack-infra		21:20
anteaya	and maybe find out something useful for you?	21:20
clarkb	anteaya: I don't they are fairly nebulous around scaling	21:20
clarkb	sdague lgtm	21:20
anteaya	clarkb: yeah that is what I understood	21:20
clarkb	anteaya I need to upgrade to latest next week	21:20
clarkb	newer versions are supposed to be better	21:20
anteaya	do you think that will address some of the current scaling issues?	21:20
anteaya	k	21:21
anteaya	I'll ask about versions tomorrow	21:21
*** mrodden has quit IRC		21:21
anteaya	what version of logstash and elastic search are we using right now	21:21
anteaya	and what do you want to go to next week?	21:21
clarkb	yes es memory use is much better in 0.90.X apparently	21:21
sdague	clarkb: next time you are in logstash, I have requests for 2 pieces of metadata to get added to the runs	21:21
sdague	1) cloud-az	21:21
sdague	2) branch	21:22
*** CaptTofu has quit IRC		21:22
clarkb	dkranz: I don't know. currently on a poor connection.	21:23
anteaya	clarkb: the one bit of info I got from my after dinner walk around Budapest companions, who just happen to have an elastic search as a service company - what luck - is that they run many small clusters rather than large clusters	21:23
clarkb	sdague: noted	21:23
sdague	clarkb: thanks :)	21:24
dkranz	clarkb: ok, given that my patch is still hung in zuul almost 4 hours later perhaps it is just slow	21:24
anteaya	I'm not sure how the size of our cluster would be characterized	21:24
clarkb	anteaya: interesting I wonder how they shard across clusters	21:24
anteaya	I can ask	21:24
*** esker has quit IRC		21:27
sdague	clarkb: ok, jenkins did a +1 - https://review.openstack.org/#/c/51299/	21:28
*** vipul is now known as vipul-away		21:28
*** vipul-away is now known as vipul		21:28
sdague	jeblair, you got a sec to check that out as well?	21:28
sdague	I'd like to get this over so I can at least call that part good before the weekend, if possible :)	21:29
*** SergeyLukjanov is now known as _SergeyLukjanov		21:32
*** _SergeyLukjanov is now known as SergeyLukjanov		21:32
*** SergeyLukjanov is now known as _SergeyLukjanov		21:33
*** _SergeyLukjanov is now known as SergeyLukjanov		21:33
*** SergeyLukjanov is now known as _SergeyLukjanov		21:33
*** _SergeyLukjanov is now known as SergeyLukjanov		21:33
*** SergeyLukjanov is now known as _SergeyLukjanov		21:34
*** _SergeyLukjanov is now known as SergeyLukjanov		21:34
*** blamar has joined #openstack-infra		21:39
*** vipul is now known as vipul-away		21:43
openstackgerrit	A change was merged to openstack-dev/pbr: Do not pass unicode where byte strings are wanted https://review.openstack.org/48355	21:47
fungi	sdague: you still have teh typoz	21:50
*** anteaya has quit IRC		21:51
*** vipul-away is now known as vipul		21:54
*** mgagne has quit IRC		21:56
fungi	so as far as the py26 unit test timeout, i see than jenkins02 is in the midst of one of those use-all-the-things fits and is well on its way to memory exhaustion as a result... http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=41&page=2	21:58
fungi	s/than/that/	21:58
fungi	i give it 30-60 minutes before available ram is full	21:59
fungi	though looking at the swap graph, yesterday's oom condition didn't happen until it reached around 0.5g swap used and then suddenly spiked in a matter of 10-20 minutes until it was up to 2g swap	22:01
clarkb	:/ is there a newer version of jenkins out. ww could try upgrading	22:02
jeblair	fungi can you gracefully stop and restart it?	22:02
fungi	jeblair: i definitely can	22:02
fungi	was wondering if we wanted to troubleshoot further first, since we've caught it in this state	22:02
fungi	i'm checking the thread count real quick	22:02
*** senk has joined #openstack-infra		22:03
jeblair	i am afk and not useful	22:03
*** dkranz has quit IRC		22:03
fungi	no worries--i'm collecting what details i can first	22:03
fungi	but will definitely try to cycle it here in a moment and see if that helps	22:03
*** gyee has quit IRC		22:03
clarkb	++	22:04
*** pcm_ has quit IRC		22:04
clarkb	I cant help for a bit but should have proper wifi in about an hour	22:04
*** SergeyLukjanov has quit IRC		22:04
fungi	thread count is highish but reasonable. not like that other time where it went batty	22:05
*** SergeyLukjanov has joined #openstack-infra		22:05
fungi	Threads on jenkins02.openstack.org@166.78.48.99: Number = 1,935, Maximum = 3,390, Total started = 106,699	22:05
Steely_Spam	https://review.openstack.org/#/c/49622/	22:05
Steely_Spam	is that hanging out because it got a -1 during check?	22:05
Steely_Spam	I didn't think that was a thing	22:06
*** SergeyLukjanov is now known as _SergeyLukjanov		22:06
fungi	for comparison...	22:06
fungi	Threads on jenkins01.openstack.org@166.78.188.99: Number = 1,422, Maximum = 19,590, Total started = 807,517	22:06
*** _SergeyLukjanov is now known as SergeyLukjanov		22:06
*** senk has quit IRC		22:07
clarkb	no it should clear the -1 and move on. that is why zuul leaves a gate jobs starting comment	22:07
Steely_Spam	clarkb: okay, I thought so...	22:08
fungi	clarkb: Steely_Spam: though in this case i'm not finding it on the zuul status page	22:08
Steely_Spam	fungi: right, it's not in the queue for some reason	22:09
fungi	it got a new patchset upload after it was approved but before it merged, then got approved again	22:09
*** jerryz has joined #openstack-infra		22:09
Steely_Spam	maybe a reverify would kick it?	22:09
fungi	it's possible it was re-approved while the previous patchset was still in the process of waiting to be kicked out of today's extremely slow gate	22:09
fungi	Steely_Spam: so, yes, try to reverify and see if jenkins leaves a new "starting gating" comment on it after that	22:10
* Steely_Spam tries		22:10
jerryz	fungi: it is still not unusual for me to run into this bug: https://bugs.launchpad.net/openstack-ci/+bug/1225664	22:10
uvirtbot	Launchpad bug 1225664 in openstack-ci "tempest.api.volume.test_volumes_actions.VolumesActionsTestXML flakey failure" [High,Triaged]	22:10
Steely_Spam	fungi: related question: can I put a recheck/reverify command on the first line and more comment below it, or does the whole comment have to be just the command in order to work?	22:10
fungi	jerryz: did you hit it recently?	22:11
jerryz	fungi: i also see in e-r status report several reviews also fail due to that bug	22:11
Steely_Spam	fungi: yes, that kicked it and like ten behind it, thanks :)	22:11
fungi	Steely_Spam: no, it's a very strict match right now, no comments in the same post. i usually leave a second comment with my details	22:11
Steely_Spam	fungi: okay, I've been doing the same, just wondering	22:11
jerryz	fungi: my code base tested should be two or three days ago	22:11
jerryz	fungi: but in e-r 's report, recent reviews also hit similar failure	22:12
fungi	jerryz: it's also possible the elastic-recheck criteria for matching that issue are too vague and catching more than one problem under that umbrella	22:12
fungi	jerryz: link to a recent failure or the report you're talking about?	22:13
jerryz	Affecting changes: 42523, 46696, 46479, 46206, 46598, 45306, 46738, 46777, 46219, 46792, 42240	22:13
jerryz	https://review.openstack.org/#/c/42240/	22:13
*** dcramer_ has quit IRC		22:14
*** tvb\|afk has quit IRC		22:14
fungi	jerryz: thanks--i'll try to take a look in a bit once i've got jenkins02 back under control	22:14
fungi	heh... top reports the jvm on jenkins02 is using 40g of virtual memory. it doesn't have but 32g including swap	22:15
fungi	must be shared	22:15
fungi	resident is 26g though	22:16
fungi	okay, jenkins02 is preparing for shutdown. i'll restart the service once all jobs complete	22:17
fungi	probably about 30 minutes	22:18
jeblair	fungi i think nodepool is running the new code that should shift load to jenkins01. you may want to keep an eye on jenkins01.	22:19
fungi	yeah, as of this morning's restart. i was thinking about that as well	22:20
jeblair	since theres a lot of untested stuff going on.	22:20
jeblair	if jenkins01 gets overloaded we may need to add a cap in nodepool.	22:21
fungi	jerryz: okay, i see that's the swift storage cap being exceeded? you might see if afazekas wants to work on enlarging that since he did the past couple of changes for it (or propose a similar one?)	22:21
fungi	jeblair: definitely agree	22:21
jgriffith	jerryz: question on that...	22:21
jgriffith	jerryz: which case of it are you seeing?	22:21
jeblair	fungi if there is a prob you can adjust provider max values in nodepool.yaml to quickly get a similar effect.	22:22
fungi	jeblair: noted--thanks!	22:22
*** SergeyLukjanov has quit IRC		22:22
jerryz	jgriffith: https://review.openstack.org/#/c/46531 and https://review.openstack.org/#/c/42240/	22:23
jerryz	those are recent failures	22:23
fungi	i'll afk for a few minutes while jenkins02 finishes up and brb	22:24
jgriffith	jerryz: interesting... 500 failure back from the glance client	22:26
jgriffith	jerryz: http://logs.openstack.org/31/46531/6/gate/gate-tempest-devstack-vm-postgres-full/aa0cbc2/logs/screen-c-vol.txt.gz#_2013-10-10_15_52_57_753	22:26
*** thedodd has quit IRC		22:30
*** CaptTofu has joined #openstack-infra		22:30
lifeless	jeblair: I'd like to offer all TripleO ATC's accounts on this cloud; I could just mail -dev but I'm pondering whether something more directed (e.g. direct email) would be good	22:31
*** sarob has quit IRC		22:31
BobBall	fungi: Is there any way to access a vnc console or similar for VMs in the HP cloud?	22:33
openstackgerrit	Sean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul https://review.openstack.org/51299	22:33
sdague	fungi: oops, thanks	22:33
*** changbl has quit IRC		22:35
fungi	lifeless: if it's decided that an e-mail list of tripleo atcs is warranted, i can generate one on whatever set of repositories and timeframe you want, basically same as we would for a tripleo ptl election	22:36
*** rcleere has quit IRC		22:36
jeblair	lifeless recommend -dev for now as i'd want to carefully consider giving out email addrs	22:36
fungi	agreed. i'm hesitant as well, but it's a technical possibility	22:37
lifeless	jeblair: ack	22:37
jeblair	i personally think this is a fine use, but i dont want to surprise anyone or break any implied trusts	22:37
fungi	(and you can always just scrape the git commit logs, but that's got the same privacy concerns)	22:37
fungi	BobBall: i believe so, but it's been a while since i needed console access to am hpcloud vm	22:38
jeblair	so lets separately come up with some policy for the future	22:38
sdague	fungi: can I get another look from you on the os_loganalyze add - https://review.openstack.org/51299 ?	22:38
*** datsun180b has quit IRC		22:38
fungi	sdague: yep, was about to pull it back up	22:39
BobBall	fungi: any ideas how I might do that? the web interface doesn't seem to give me a clue...	22:39
sdague	coolio	22:39
*** CaptTofu has quit IRC		22:39
fungi	sdague: keep in mind i only -1'd you to game my review positivity stats ;)	22:39
sdague	:)	22:40
*** CaptTofu has joined #openstack-infra		22:40
clarkb	BobBall I am not sure you can. I had the same problem last I tried	22:40
fungi	that's sucky	22:41
openstackgerrit	Dan Nguyen proposed a change to openstack/requirements: Add pwtools to requirements for password generator https://review.openstack.org/51068	22:41
* BobBall sighs deeply		22:41
BobBall	that's a real shame...	22:41
lifeless	jeblair: cool, thanks	22:41
fungi	on the other hand, it seems like a good chunk of nova denial of service issues were related to novnc, so maybe disallowing access there is a defensive measure	22:42
* BobBall bangs his head against the soft fluffy HP cloud		22:42
lifeless	BobBall: oh?	22:42
BobBall	Struggling trying to get Xen booting nested so we can look at gating tests... and the lack of VNC access means I can't play with boot parameters - once I set them, and it fails, I have to reinstall the machine	22:43
BobBall	it's a right pain	22:43
*** CaptTofu has quit IRC		22:44
lifeless	BobBall: oh :)	22:44
fungi	i know rackspace provides a console. on the down side the reason i know that is because of having to frequently try to troubleshoot crashed/hung/dead virtual machines	22:44
lifeless	BobBall: erm, I meant oh :(	22:44
lifeless	BobBall: do you have xen booting locally using kvm ?	22:44
lifeless	BobBall: could you just upload a custom image?	22:44
BobBall	we've had it working, yes	22:44
fungi	lifeless: via that awesome glance service they offer their customers ;)	22:45
lifeless	fungi: yup, we have that	22:45
BobBall	not seen that upload a custom image?	22:45
fungi	lifeless: is it no longer in beta?	22:45
lifeless	fungi: it's in public beta still I believe	22:45
jerryz	jgriffith: can i file a bug?	22:46
fungi	well, public beta is way better than secret beta. that's something rackspace still hasn't provided	22:46
BobBall	lifeless: how would I do that?	22:46
jgriffith	jerryz: the bug that you pointed to is valid. Just need to add cinder and possibly glance but not sure yet	22:46
jgriffith	jerryz: I'll have to get back to it here when I have some more time	22:47
BobBall	fungi: RS cloud is even less fun - in theory it's doable but in practice we need an HVM linux guest which is a pain to get hold of with RS cloud :P	22:47
* fungi nods		22:47
jgriffith	jerryz: feel free to add Cinder to the projects, I don't think it's an infra bug that's for sure	22:47
BobBall	this is the joy of nested virt...	22:48
lifeless	BobBall: hardware assisted virt will be disabled in the kvm vms though surely	22:48
lifeless	BobBall: go to https://account.hpcloud.com/services	22:49
lifeless	BobBall: select us east in the beta section and request access	22:49
lifeless	BobBall: then once you get that, you can ask for glance access too	22:50
BobBall	great, thanks lifeless	22:50
lifeless	BobBall: it was about 24 hour turnaround when I got it enabled on the -infra account	22:50
lifeless	though I don't think they've done anything with it:P	22:50
BobBall	beta request sent :)	22:50
lifeless	BobBall: I'd be delighted to help you get a physical test environment up, if you guys have machines - we should be able to use nova baremetal + nodepool to get you d-g style instances of actual xen deployed pretty easily	22:52
BobBall	we do - although not nearly the number of machines that -infra use for the gate :)	22:54
sdague	clarkb, jeblair: either of you good with putting this through https://review.openstack.org/#/c/51299/ ? then we could get the gerrit core team set, and I can make changes on that side	22:54
BobBall	virtualisation should work - it _really_ should...	22:55
jgriffith	jerryz: cool.. thanks!	22:56
*** dcramer_ has joined #openstack-infra		22:56
fungi	sdague: clarkb seemed basically okay with the previous patchset in irc. i'm okay approving it and will troubleshoot whatever i might overlook	22:56
sdague	fungi: that would be awesome	22:57
lifeless	BobBall: how many concurrent vm's does a gate run need though?	22:57
sdague	then add me + infra-core to the core team in gerrit	22:57
lifeless	BobBall: say one for d-g itself, and some N concurrent test instances: one solid xen machine should be able to support at least 5 or 6 concurrent d-g style tests.	22:58
lifeless	BobBall: (without slowing each test down, I mean)	22:58
sdague	I'm on for about the next 20 mins	22:58
sdague	then it's off to Plan 9 - http://www.bardavon.org/mobile/event_info.php?id=694	22:59
BobBall	Perhaps - although I figured we needed one host per VM that's running tests - just to ensure there aren't any cross-interactions which might cause problems?	23:00
BobBall	although maybe I don't understand what d-g style tests are :P	23:00
lifeless	BobBall: d-g runs devstack which you'd want configured to talk to xen	23:00
lifeless	BobBall: I don't know xen well; could you have multiple devstacks talking to one xen ?	23:00
BobBall	in theory, sure	23:01
BobBall	but if you have it then there is a risk of one set of tests interacting with another	23:01
lifeless	k	23:01
*** boris-42 has quit IRC		23:01
BobBall	e.g. if you break the xenserver in a horrible way (or the plugins don't match...) then it might show up as a failure when it shouldn't have	23:01
lifeless	perhaps have it just run nova gates?	23:01
BobBall	That'd be easier for sure	23:01
BobBall	so how many hosts do you think might be needed?	23:02
openstackgerrit	A change was merged to openstack-infra/config: add os-loganalyze to gerrit & zuul https://review.openstack.org/51299	23:02
lifeless	nova is a pretty big fraction of the changes	23:03
lifeless	but	23:03
lifeless	I don't have a gut feel - clarkb / fungi may well	23:03
*** senk has joined #openstack-infra		23:03
lifeless	the full gate, remembering my back of envelope figures	23:04
lifeless	was 400 changes in one day	23:04
lifeless	at 30m each	23:04
BobBall	I see	23:05
openstackgerrit	Joe Gordon proposed a change to openstack-infra/elastic-recheck: Change test_queries from logical AND to OR https://review.openstack.org/50160	23:05
sdague	fungi: so now that it's merged, we just want for the next puppet update to trigger the import?	23:06
BobBall	oh rubbish - just realised it's midnight	23:06
BobBall	I really should get some sleep	23:06
lifeless	BobBall: https://etherpad.openstack.org/tripleo-test-cluster	23:06
lifeless	BobBall: we figureed 40 concurrent test environments is sufficient	23:06
lifeless	BobBall: so 40 small machines for xen	23:06
fungi	sdague: yup, and then i'll add you as the initial core group member, and add the infra core group as included	23:06
sdague	fungi: cool	23:06
lifeless	BobBall: perhaps a moonshot chassis fully loaded?	23:07
BobBall	that'd be a very nice way to do it	23:08
*** senk has quit IRC		23:08
fungi	lifeless: BobBall: if you're just talking about gating load, have a look at http://status.openstack.org/zuul/ and note that each job listed for a change is using an 8gb vm with 4x vcpu	23:09
lifeless	fungi: moonshot is dual core + hyperthreads with 8GB	23:09
fungi	so depending on the project you're gating, maybe around 10ish servers in parallel	23:09
fungi	lifeless: sounds comparable	23:10
lifeless	fungi: right, it's why I suggested it.	23:10
fungi	is that the arm hardfloat version or the atom one?	23:10
lifeless	fungi: there will be higher density cartridges in future, of course	23:10
BobBall	10 doesn't sound enough to me if I'm honest	23:10
lifeless	fungi: atom, it even has VTx	23:10
fungi	BobBall: i meant 10ish per change you want to test in parallel	23:10
BobBall	I'm very tempted by the moonshot idea	23:11
lifeless	BobBall: fungi means 10 * - 10 servers per commit, but I think he's wrong :)	23:11
BobBall	oh I see	23:11
BobBall	why 10 per commit?	23:11
fungi	i may be. checking the veracity of my assertion now	23:11
lifeless	fungi: do you mean 'tempest runs 10 sub-vm's ?	23:11
lifeless	fungi: or do you mean 'zuul schedules 10 jobs' ?	23:11
sdague	are you guys talking about devstack/tempest runs?	23:11
sdague	because our experience is the cpu does matter quite a bit	23:11
BobBall	We're talking about adding a devstack/tempest/xenapi run somehow :)	23:12
sdague	which is why the rax nodes aren't used	23:12
sdague	so atom... not a great idea :)	23:12
fungi	just talking about jobs in general. if you were to replicate all of our gating, we use 9 virtual machines in parallel for each iteration of attempting to gate a nova change, for example	23:12
lifeless	sdague: mmm, I'd seriously consider native atom over virtualised $other :>	23:12
lifeless	fungi: right, so thats the wrong way to look at it	23:13
BobBall	ahhh ok	23:13
fungi	not sure what metric BobBall was looking for there	23:13
lifeless	fungi: the way to look at is is we're adding one more job to that set.	23:13
fungi	oh, in that case one per change tested in parallel	23:13
lifeless	fungi: so from 9 vm's to 10, one of which BobBall would be providing in a dedicated xen-capable-environment.	23:13
BobBall	I'm not sure I know either :)	23:13
sdague	lifeless: it seems pretty cpu bound, so virtualized doesn't have much overhead	23:13
lifeless	sdague: tempest is running against qemu vm's	23:14
sdague	lifeless: the qemu vm start times isn't really the issue	23:14
fungi	jenkins02 has been gracefully restarted and is coming up now	23:14
*** sarob has joined #openstack-infra		23:14
lifeless	sdague: ok; I'll defer to data here.	23:14
lifeless	sdague: just that even cirros can't make the vm's do their stuff well >	23:14
lifeless	sdague: I would want to investigate a xen-on-moonshot test before writing it off	23:15
sdague	fair, just saying what I've seen.	23:15
lifeless	sdague: these aren't the atoms most folk have seen	23:15
sdague	ok, well even the amd chips in rax give us a 40% slow down compared to the intel chips at hp	23:16
BobBall	well I'd question whether we'd need to run the full test of tempest tests as well - they all pass of course, so that's not the issue, but some of them are entirely independent of the ypervisor driver	23:16
lifeless	http://www8.hp.com/us/en/products/proliant-servers/product-detail.html?oid=5375897#!tab=specs <- the cartridges I'm referring to	23:16
BobBall	the rax chips you were testing on are a fair bit older than the intel ones at HP though	23:16
sdague	lifeless: what's the L3 look like on those?	23:16
sdague	BobBall: fair	23:16
lifeless	http://ark.intel.com/products/series/71265/Intel-Atom-Processor-S1200-Product-Family-for-Server	23:16
sdague	I'd say get some data on a real system first though	23:17
lifeless	sdague: 1 MB	23:17
lifeless	sdague: yes, +1 on getting real data	23:17
sdague	so, I'd be suspicious then. We've some some pretty strong corolation between L3 size and speed here.	23:18
sdague	but some runs would be good	23:18
BobBall	Do you have access to a moonshot system lifeless? I can probably get access but it's likely to take a while	23:18
sdague	ok, movie time	23:18
lifeless	BobBall: not at the moment, but I know folk who do :/	23:18
BobBall	okay	23:18
*** fifieldt has joined #openstack-infra		23:19
fungi	sdague: if it's wood's original plan 9, one of my favorites ;)	23:19
BobBall	okay I'll check with our HP blokey	23:19
lifeless	BobBall: I would suggest, if doing this is a real possibility, that we go in the front door and get a sales person involved - the sales folk have ready access to moonshot for customer evaluations	23:19
lifeless	BobBall: (e.g. fully populated 45 cartridge + two switch chassis)	23:19
BobBall	Maybe. I know someone who has been talking about moonshot so I'll have a few words with him first	23:20
BobBall	and try the glance upload route too :)	23:21
lifeless	cool	23:21
BobBall	all sorts of fun!	23:21
lifeless	if you run into a wall, let me know	23:21
lifeless	I have some interactions with moonshot teams	23:21
BobBall	perfect, thanks.	23:21
BobBall	sleep	23:22
*** BobBall is now known as BobBallAway		23:22
lifeless	gnight!	23:22
*** marktraceur is now known as FreeThaiFood		23:23
fungi	jeblair: anecdotal but worth watching for next time, we had a great many more devstack jobs end up on jenkins02 as soon as it came up than were running on jenkins01. like it got favored for some reason (maybe accumulated shares from while it was unreachable?)	23:26
fungi	at the moment there are about 5 devstack jobs running on jenkins01 and nearly 50 on jenkins02	23:27
fungi	but jobs still seem to be running and completing successfully	23:29
fungi	i'll check back in on it in a bit	23:29
*** pentameter has quit IRC		23:33
*** mriedem has joined #openstack-infra		23:37
*** nati_uen_ has joined #openstack-infra		23:45
*** nati_ueno has quit IRC		23:46
*** FreeThaiFood is now known as marktraceur		23:48
*** hogepodge has quit IRC		23:49
*** rnirmal has quit IRC		23:54
*** vipul is now known as vipul-away		23:57

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!