Friday, 2013-12-13

fungi	zuul seems to have restarted jobs from jenkins01, as far as i can tell from the status page	00:00
*** paul-- has joined #openstack-infra		00:00
fungi	is that new behavior, or am i seeing fairies?	00:00
clarkb	yeah it is running a bunch of jobs	00:01
fungi	i want to say the last time a jenkins master fell over we had jobs stuck in a running state in zuul until we restarted it	00:01
*** oubiwann-lambda has quit IRC		00:01
fungi	(until we restarted zuul i mean)	00:01
*** nicedice has quit IRC		00:02
clarkb	we need to make sure that the things that feed off of zmq reconnected	00:02
fungi	nodepool needed a restart last time, right?	00:02
clarkb	yeah and the logstash gearman client	00:03
*** nicedice has joined #openstack-infra		00:03
clarkb	zmq is supposed to avoid these problems but ugh	00:03
fungi	i can go punt it now	00:03
clarkb	fungi: well we should bea ble to check if it is connected I think	00:03
fungi	or do we want to wait and see if nodepool retains sanity this time?	00:03
fungi	yeah, that	00:03
*** UtahDave has quit IRC		00:04
jeblair	i'll check on nodepool	00:04
*** pabelanger has joined #openstack-infra		00:05
fungi	looks like we're also down four precise nodes across the two masters	00:06
fungi	i'll get them back up and working at some point this evening	00:06
jeblair	nodepool is receiving zmq events from both masters, and is currently assigning nodes to both of them.	00:07
jeblair	(btw, it correctly noted jenkins01 was down earlier and stopped trying to assign nodes to it)	00:07
fungi	that is a significant improvement over last time	00:07
clarkb	jeblair: woot	00:08
*** mdenny has quit IRC		00:08
clarkb	logstash seems fine too	00:08
*** pabelanger has quit IRC		00:09
clarkb	I think zmq reconnect mechanism works fine when the disappearance of the service is relatively short	00:09
jeblair	and yeah, zuul should know to restart jobs if gearman fails, so that's (optimistically) expected behavior	00:09
clarkb	it has problems when it is hours long	00:09
*** blamar has quit IRC		00:10
clarkb	https://review.openstack.org/#/c/61321/1 is the change to fix nodepool install on jenkins-dev	00:10
jeblair	clarkb: how did you notice jenkins01 died?	00:10
openstackgerrit	A change was merged to openstack-infra/config: Enable patchset-created for #openstack-state-management channel https://review.openstack.org/61605	00:12
*** jgrimm has quit IRC		00:13
clarkb	jeblair: I was going to hold d-g nodes in hopes of debugging the tempest ssh test, and couldn't open the web UI for jenkins01 to find which nodes needed holding	00:13
*** bnemec has quit IRC		00:14
anteaya	AaronGr: you still around?	00:14
AaronGr	anteaya: hi, i am.	00:15
fungi	clarkb: zaro: does someone still need to test-drive https://review.openstack.org/61321 on jenkins-dev to confirm it puppets successfully?	00:15
jeblair	clarkb: neat. so basically we just lost half our infrastructure and it was pretty much only noticable by admins. that makes me happy. :)	00:15
clarkb	fungi: probably, I have also -1'd it because I think it needs a few tweaks	00:15
fungi	oh	00:15
clarkb	zaro: can you address those comments?	00:16
anteaya	AaronGr: to help you parse the above, the path of a commit is user -> Gerrit -> Zuul -> Jenkins job running on a node provided by node pool	00:16
fungi	yes, looks like you just did that	00:16
anteaya	AaronGr: that is simplified but a place to begin for now	00:16
clarkb	jeblair: yup I think I caught it within a couple mintues of it happening	00:16
anteaya	AaronGr: every new patch submitted follows that path for testing	00:16
jeblair	clarkb: i think you did too, but i meant that it seems like we're making headway in increasing fault tolerance.	00:17
clarkb	yup	00:17
AaronGr	anteaya: gerrit = review, zuul = scheduler for jenkins tasks?	00:17
*** bnemec has joined #openstack-infra		00:18
anteaya	AaronGr: gerrit == review	00:18
anteaya	not sure if I could call zuul a scheduler or not, but definitely a co-ordination layer between Gerrit and Jenkins jobs	00:18
fungi	"scheduler" is not an inappropriate term for it	00:19
*** dstanek has quit IRC		00:20
AaronGr	sorry, by scheduler i meant 'mechanism for submitting a task to run a jenkins job', in this case to validate a reviewed patch.	00:20
fungi	in fact, http://ci.openstack.org/zuul/ starts out in its introduction, "The main component of Zuul is the scheduler." (so it's more than a scheduler, but that's a lot of it)	00:20
AaronGr	assuming it comes back without failing, what's the next step? (i am taking the ci page line at a time, hadn't hit Z yet grins)	00:21
anteaya	AaronGr: that can work for now, but as you understand more prepare to refine the definition	00:21
clarkb	zaro: if you aren't able to make those chagnes I think I will quickly make them	00:21
anteaya	logs get posted to the static logs server	00:21
AaronGr	anteaya: absolutely, this is helping to balance out what i've been reading with a condensed set of steps.	00:21
anteaya	zuul tells gerrit what to write on the comment on the patch, attributed to Jenkins	00:22
anteaya	AaronGr: great	00:22
anteaya	helps me to say it out loud too	00:22
anteaya	learning by teaching	00:22
AaronGr	so back to gerrit, and then it goes through someone who does the final merge?	00:22
anteaya	so pass or fail, that is what happens	00:23
AaronGr	(appreciated)	00:23
fungi	anteaya: AaronGr: actually, http://docs.openstack.org/infra/publications/overview/ is a good high-level presentation on those topics	00:23
anteaya	AaronGr: a person approves, the patch runs through the gate, if passed it is merged as part of the jenkins job	00:23
anteaya	no human merges	00:23
fungi	it's the "how we try to explain infrastructure to a room full of people in 30 minutes to an hour"	00:23
AaronGr	fungi: ok, the explanations here help the pictures make sense.	00:24
fungi	(complete with pretty pictures)	00:24
*** openstackgerrit has quit IRC		00:24
fungi	ahh, cool, didn't know if you'd found the presentations yet	00:24
jeblair	i think there are youtube videos of us giving that presentation.	00:24
*** openstackgerrit has joined #openstack-infra		00:24
*** ekarlso has quit IRC		00:25
*** ekarlso has joined #openstack-infra		00:25
AaronGr	so code -> gerrit -> jenkins test -> gerrit -> jenkins -> codebase looks like the oneliner	00:25
AaronGr	(assuming 2 levels of review and no bugs)	00:25
anteaya	I think you have a basic understanding	00:27
*** herndon has quit IRC		00:27
jeblair	clarkb: after only 4 training runs, i'm starting to get results like this from crm114:	00:27
AaronGr	nice!	00:27
jeblair	bad 1.0000 5.3426 2013-12-11 21:45:56.664 \| Details: {u'conflictingRequest': {u'message': u"Cannot 'rebuild' while instance is in task_state rebuilding", u'code': 409}}	00:27
fungi	neat!	00:27
jeblair	the 1.0 is a rather high probability that line is associated with a failure	00:28
*** reed has quit IRC		00:28
fungi	it's picking up quickly	00:28
clarkb	jeblair: nice, would it be possible to make that an elasticsearch column?	00:29
jeblair	fungi: knowing the answers ahead of time helps. :) and actually changes the problem a bit.	00:29
clarkb	jeblair: right you can train on any job that was successful	00:29
fungi	jeblair: well, true. it's more ham vs spam at that stage	00:29
jeblair	clarkb: that's what i'm thinking	00:29
clarkb	jeblair: I am thinking that if we have a numeric elasticsearch column with some probability then we can saerch based on that	00:30
clarkb	lucene can do >=0.8 for example on numeric fields iirc	00:30
fungi	oh, neat. yeah filter or sort on bayesian score	00:30
lifeless	jeblair: ohh what are you training?	00:30
*** ArxCruz has joined #openstack-infra		00:31
jeblair	lifeless: i'm seeing if crm114 can help identify log lines that indicate failure; it's something we talked about at the havana summit but needed logstash to exist first.	00:32
anteaya	hey ArxCruz, I'm going to be sending people your way if they have questions about setting up their own infra	00:32
anteaya	you and lifeless	00:33
anteaya	hope that is okay with you	00:33
ArxCruz	anteaya: sure, what's happening ?	00:33
anteaya	ArxCruz: neutron needs all the plugin developers to provide 3rd party testing, there are a few of them	00:33
*** oubiwan__ has joined #openstack-infra		00:34
anteaya	we suggested they set up their own infra using zuul, devstack-gate and nodepool to do so	00:34
ArxCruz	anteaya: i know IBM is doing something I've been contacted from some colleagues	00:34
anteaya	ArxCruz: cool	00:34
anteaya	yes if IBM has a neutron plugin they will have to test it	00:34
*** rcleere has quit IRC		00:35
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321	00:35
clarkb	zaro: fungi ^	00:35
fungi	anteaya: i take it the gerrit jenkins plugin solution described on the third-party testing howto was insufficient for most of them?	00:35
*** vipul is now known as vipul-away		00:35
*** vipul-away is now known as vipul		00:35
anteaya	fungi: I don't think that even came up	00:37
anteaya	ArxCruz: fungi meeting logs: http://eavesdrop.openstack.org/meetings/networking_third_party_testing/2013/networking_third_party_testing.2013-12-12-17.00.log.html	00:37
lifeless	jeblair: sweeet	00:37
anteaya	etherpad: https://etherpad.openstack.org/p/multi-node-neutron-tempest	00:37
fungi	anteaya: http://ci.openstack.org/third_party.html	00:37
clarkb	fungi: there seems to be a large lack of reading prior art and a lot of what do we do	00:37
* anteaya clicks		00:37
anteaya	clarkb: large lack	00:38
lifeless	anteaya: so, I will answer questions as needed, but folk should come here firstly.	00:38
lifeless	anteaya: this is the community forum for discussion	00:38
anteaya	fungi: I hadn't read that before either, that looks so simple	00:39
clarkb	jeblair: the more you say about it the mroe I am interested :) curious to know what your plan for piping the data through crm114 is and what the crm114 setup looks like (iirc crm114 can do several different types of filters)	00:39
clarkb	jeblair: but don't let me distract you	00:39
anteaya	fungi: so all they would need is their own Jenkins with this trigger plugin?	00:39
anteaya	lifeless: understood	00:40
fungi	anteaya: plus stuff for their jenkins to run, and a place to post their logs	00:40
anteaya	I find it helpful to give them names, otherwise they tend to stay silent and just write it all themselves	00:40
clarkb	fungi: a lot of them want to do things that don't lend well to the trigger plugin	00:40
anteaya	fungi: right, so simple	00:40
anteaya	that is true too	00:41
clarkb	they want mutli node baremetal testing with single use environment and the ability for granular control over what events trigger specific jobs	00:41
clarkb	tl;dr I really think they should look at zuul devstack-gate and nodepool	00:41
anteaya	in any case they should be in here asking questions	00:41
fungi	got it	00:41
anteaya	so brace for onslaught	00:41
anteaya	at least I hope they show up in here asking questions	00:42
openstackgerrit	Tom Fifield proposed a change to openstack-infra/config: Add welcome_message.py to patchset-created trigger https://review.openstack.org/61898	00:42
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321	00:44
clarkb	that should fix the lint error I hope	00:44
*** dstanek has joined #openstack-infra		00:44
*** sarob has quit IRC		00:46
*** sarob has joined #openstack-infra		00:46
*** dstanek has quit IRC		00:49
*** senk has joined #openstack-infra		00:50
*** sarob has quit IRC		00:51
*** vipul is now known as vipul-away		00:51
openstackgerrit	lifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now. https://review.openstack.org/61900	00:51
openstackgerrit	Tom Fifield proposed a change to openstack-infra/jeepyb: Add dryrun flag to welcome_message.py https://review.openstack.org/61901	00:52
openstackgerrit	Tom Fifield proposed a change to openstack-infra/config: Add welcome_message.py to patchset-created trigger https://review.openstack.org/61898	00:53
*** vipul-away is now known as vipul		00:53
clarkb	sdague: jeblair fungi https://bugs.launchpad.net/devstack/+bug/1253482 see my last comment there	00:53
uvirtbot	Launchpad bug 1253482 in keystone "Keystone default port in linux local ephemeral port range. Devstack should shift range." [Undecided,In progress]	00:53
*** senk has quit IRC		00:54
fungi	clarkb: good point	00:55
*** mriedem1 has quit IRC		00:55
fungi	nodepool would have them marked as used but jenkins might have undone its marker for them being used single-use slaves?	00:56
jeblair	fungi: yeah, the gearman plugin marks them as offline. i'm guessing a restart marks them all online again.	00:56
jeblair	(so not actually a nodepool thing but rather a jenkins thing)	00:57
fungi	next time we crash or even reboot a jenkins master, should we nodepool-delete all ephemeral slaves associated with it before starting again?	00:57
clarkb	jeblair: oh right	00:57
clarkb	fungi: all used slaves	00:57
fungi	right, that	00:57
*** praneshp has quit IRC		00:57
jeblair	i'm trying to think of something nodepool could do, but the bad behavior is that jenkins brings all previously known slaves online immediately...	00:58
jeblair	i think perhaps once we get to all-dynamic slaves, we could probably write a quick script to remove all slaves from the config before (re-)starting jenkins	00:58
fungi	so, like, here in a moment when we put jenkins02 into prepare-for-shutdown, we should clear used 02 slaves out once it quiesces	00:58
clarkb	jeblair: right. is nodepool doing a temporary offline or a normal offline?	00:58
jeblair	fungi: if you put it in shutdown, they should all go away on their own.	00:58
fungi	ahh	00:59
clarkb	er sorry gearman plugin	00:59
fungi	so really only in the case of unanticipated jenkins failure	00:59
jeblair	clarkb: i think there is only "offline" and "disconnect"; so i think it's just doing offline. disconnect would be problematic. if there's an offline that's more than offline, i'm not familiar with it.	00:59
*** mrodden has quit IRC		00:59
jeblair	clarkb: (but this part of jenkins is kind of a mess, with internal terms not lining up at all with ui elements, etc)	01:00
clarkb	jeblair: the gui button says "Mark this node temporarily offline"	01:00
clarkb	I am guessing that lines up to offline and ya disconnect would be bad	01:00
jeblair	clarkb: i believe that's what's going on.	01:00
fungi	why is disconnect particularly bad?	01:01
jeblair	anyway, it sounds like we can make an improvement soon.	01:01
jeblair	fungi: it'll stop the running job	01:01
fungi	oh	01:02
fungi	yeah, that's bad. okay, important safety tip	01:02
jeblair	fungi: heh. yeah, this happens immediately when a job starts so that there's no race condition with doing this when it finishes.	01:02
*** ^demon\|lunch has quit IRC		01:02
*** blamar has joined #openstack-infra		01:02
fungi	sort of the jenkins equivalent of total protonic reversal. got it	01:03
clarkb	would be nice if we had temporary offline functionality that wasn't temporary	01:03
jeblair	we could probably change the node label too, but that's a lot of extra work for gearman-plugin.	01:03
jeblair	may be worth looking into though.	01:04
clarkb	jeblair: and in the long run probably better putting the effort into making jenkins more reliable	01:04
jeblair	(or more gone)	01:04
clarkb	fungi: re https://review.openstack.org/#/c/61321/3 did you want to try applying that to nodepool.o.o and jenkins-dev? I can give it a shot tomorrow	01:05
fungi	clarkb: doing now	01:05
*** jhesketh_ has quit IRC		01:05
openstackgerrit	A change was merged to openstack-dev/hacking: Fix typos of comment in module core https://review.openstack.org/61111	01:06
*** jerryz__ has quit IRC		01:06
clarkb	fungi: I would --noop it first time just to make sure we didn't do anything silly :)	01:06
fungi	yup	01:07
anteaya	AaronGr: --noop is no op or no operation, it means a test which stands up a devstack on a node and returns true	01:09
clarkb	anteaya: I think AaronGr is familiar with the puppets	01:09
anteaya	AaronGr: it's main purpose that I know of is a placeholder for further tests	01:09
clarkb	in fact I think we might be able to bug him about making puppet stuff better >_>	01:09
anteaya	clarkb: ah okay, will look for him to teach me about the puppets	01:09
anteaya	thank you	01:09
fungi	anteaya: in this case it means i want puppet to pretend to apply new configuration but not actually do it	01:10
fungi	(no op is a very common term in computing)	01:10
anteaya	the pretending to apply configuration is always so satisfying	01:10
anteaya	ah sorry, my mistake then was new for me	01:10
fungi	well, when it tells me what it's going to do without actually doing it (and then screwing things up), yes satisfying ;)	01:10
anteaya	guess I am sharing the stuff I wish I knew	01:11
anteaya	hehe	01:11
AaronGr	anteaya: thankfully, one thing i am bringing with me is a bit of puppet, been actively using it for about a year (i run a puppetmaster in my house, for my home network)	01:13
fungi	mmm, puppet agent was not even running on jenkins-dev. going to update it from production before i --noop	01:13
anteaya	AaronGr: awesome, I will ask you many stupid puppet questions	01:13
anteaya	get ready	01:13
AaronGr	anteaya: fair trade, you're welcome to anything i know. have looked through about 40% of infra/config -- i saw at least 10 spots modules could get rewritten or refactored easily	01:14
* StevenK waits for "Is the puppet made out of oak, pine or maple?"		01:14
AaronGr	plus some really cool places to use more hiera.	01:14
anteaya	AaronGr: awesome	01:14
fungi	ahh, i think the agent must have been left stopped while testing the previously-broken nodepool addition	01:14
anteaya	hiera I don't understand at all, so if you do, power to you	01:14
anteaya	AaronGr: I have a few infra bugs with my name on it that you might like, puppety stuff	01:15
anteaya	AaronGr: been in #openstack-neutron since the summit, hard to wear two hats, at least for me	01:15
*** mrodden has joined #openstack-infra		01:15
*** oubiwan__ has quit IRC		01:15
clarkb	fungi: ya that was probably me	01:16
AaronGr	anteaya: i'll happily take them, though not until monday, when i get fully up to speed. after that, pour them on.	01:16
*** mrodden has quit IRC		01:16
anteaya	AaronGr: fair enough	01:16
fungi	clarkb: see review comment. jenkins_dev_api_user	01:17
*** mrodden has joined #openstack-infra		01:17
*** ljjjustin has joined #openstack-infra		01:18
clarkb	fungi: looking	01:19
fungi	clarkb: playing around with fixing it now. i think the template needs to just not use _dev on its vars	01:19
clarkb	fungi: oh right, because we collapsed the variables in puppet	01:20
fungi	yep, those three lines need fixing, but that's not all. new errors once i do	01:20
clarkb	woo	01:20
fungi	updated comments with the new errors	01:23
fungi	though perhaps those are an artifact of --noop	01:23
fungi	?	01:23
*** ryanpetrello has quit IRC		01:24
fungi	i can try dropping the --noop and seeing if it applies cleanly	01:24
clarkb	ya those look like artifacts of the --noop	01:24
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321	01:24
clarkb	fungi: ^ that removes the dev vars from the erb	01:24
*** jhesketh has joined #openstack-infra		01:25
*** hogepodge has quit IRC		01:25
*** oubiwan__ has joined #openstack-infra		01:26
Alex_Gaynor	So the gate is about 12 hours behind real time. Is that entirely because of resets, or other causes?	01:28
clarkb	Alex_Gaynor: mostly resets	01:29
fungi	clarkb: yet still more new error comment	01:29
clarkb	the sphinx thing and changes getting approved anyways really made it thrash yesterday	01:29
*** syerrapragada1 has quit IRC		01:29
*** syerrapragada has joined #openstack-infra		01:30
clarkb	fungi: if you can pip install by hand does it work?	01:30
*** syerrapragada has left #openstack-infra		01:31
*** praneshp has joined #openstack-infra		01:31
fungi	it may be missing dependencies for compiling libzmq	01:31
clarkb	that could be	01:31
fungi	ahh, yeah	01:31
fungi	gcc: error trying to exec 'cc1plus': execvp: No such file or directory	01:31
fungi	grah	01:32
clarkb	don't we put build-essential everywhere?	01:32
clarkb	fungi: curious why that wasn't a problem on nodepool.o.o	01:33
fungi	Installed: (none)	01:34
fungi	Candidate: 11.5ubuntu2.1	01:34
*** praneshp_ has joined #openstack-infra		01:34
fungi	as opposed to nodepool.o.o, Installed: 11.5ubuntu2.1	01:34
fungi	so, no, jenkins-dev has nothing telling it to install build-essential apparently	01:35
*** weshay has quit IRC		01:35
*** praneshp has quit IRC		01:36
*** praneshp_ is now known as praneshp		01:36
clarkb	interesting	01:36
*** syerrapragada1 has joined #openstack-infra		01:36
fungi	clarkb: also, do we still want to restart jenkins02? if so, i can go ahead and put it in shutdown now	01:37
clarkb	I wonder if jeblair installed that by hand, git grep doesn't show it anywhere that nodepool.o.o would pick it up on	01:37
clarkb	fungi: sure	01:37
clarkb	fungi: I am adding build-essential to the nodepool module now	01:37
fungi	i got precise23 and precise40 back online in jenkins, but precise5 and precise9 don't seem to want to relaunch the slave agent even after rebooting (and i'm able to ssh into them fine)	01:38
fungi	jenkins02 is in prepare for shutdown now	01:38
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321	01:38
clarkb	fungi: weird	01:38
*** jhesketh__ has joined #openstack-infra		01:38
clarkb	try that ^	01:38
*** gyee has quit IRC		01:38
*** dims has quit IRC		01:39
*** syerrapragada1 has quit IRC		01:39
fungi	looks like we've got about 40 minutes to quiescence on jenkins02, based on most recently-started jobs	01:39
fungi	clarkb: success	01:44
clarkb	woot	01:44
fungi	hrm, though nodepool's still not installed	01:44
fungi	that's... no good	01:44
clarkb	oh because the repo didn't refresh the installer?	01:45
clarkb	you can probably just delete the repo and make it reclone	01:45
*** dstanek has joined #openstack-infra		01:45
fungi	good call	01:45
fungi	i thought it finished rather quickly on that run :(	01:45
clarkb	we should just make everything stateless	01:45
*** xchu has joined #openstack-infra		01:46
fungi	nodepool==3871acf	01:47
*** sdake_ has quit IRC		01:47
fungi	much better	01:47
*** sdake_ has joined #openstack-infra		01:47
*** sdake_ has quit IRC		01:47
*** sdake_ has joined #openstack-infra		01:47
fungi	and nodepool list works (though the list is of course empty at the moment)	01:47
fungi	alien-list and alien-image-list return entries though, so auth is definitely sane	01:48
clarkb	fungi: btw what was the process for getting the credential id? did you add a credential to jenkins-dev then go grab an id out of the xml?	01:48
fungi	clarkb: yes	01:49
*** dstanek has quit IRC		01:50
fungi	i figured out where to find it first by grep'ing the prod one out of jenkins01, then confirmed that jenkins-dev had none, then went into manage credentials and added one which matched the settings in the jenkins01 webui, then fished it out of the xml after that	01:50
fungi	and bob's your uncle	01:50
clarkb	fungi: I wonder if we need to change min-ready to 1 in the nodepool config	01:51
fungi	probably	01:51
clarkb	is nodepool building an image currently?	01:51
fungi	it didn't seem to be when i looked, but i'll look again	01:51
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321	01:51
fungi	nope. image-list is still empty	01:51
clarkb	that bumps the min-ready number	01:51
fungi	testing	01:51
clarkb	I bet that fixes the image-listing	01:51
*** dims has joined #openstack-infra		01:52
*** senk has joined #openstack-infra		01:52
fungi	aha, no... nodepool daemon didn't start	01:54
fungi	clarkb: once i started the nodepool initscript, it began to build an image	01:56
fungi	did we skip an ensure => running?	01:56
clarkb	fungi: possibly. I know jeblair isn't a fan of ensure => running	01:57
fungi	yup, modules/nodepool/manifests/init.pp doesn't do it	01:57
fungi	okay, mystery solved	01:57
*** senk has quit IRC		01:58
*** dstanek has joined #openstack-infra		01:58
*** senk has joined #openstack-infra		02:00
*** AaronGr is now known as AaronGr_afk		02:01
*** CaptTofu has joined #openstack-infra		02:03
*** yongli has quit IRC		02:05
*** locke105 has joined #openstack-infra		02:09
*** senk has quit IRC		02:10
*** mrodden1 has joined #openstack-infra		02:16
*** WarrenUsui has quit IRC		02:17
*** sdake_ has quit IRC		02:17
*** WarrenUsui has joined #openstack-infra		02:18
*** mrodden has quit IRC		02:18
*** senk has joined #openstack-infra		02:20
*** senk has quit IRC		02:24
*** senk has joined #openstack-infra		02:24
*** senk1 has joined #openstack-infra		02:28
*** senk has quit IRC		02:29
*** yaguang has joined #openstack-infra		02:29
*** senk1 has quit IRC		02:31
*** mriedem has joined #openstack-infra		02:31
*** senk has joined #openstack-infra		02:32
*** reed has joined #openstack-infra		02:36
*** senk has quit IRC		02:37
*** CaptTofu_ has joined #openstack-infra		02:42
*** bingbu has joined #openstack-infra		02:43
*** CaptTofu has quit IRC		02:44
*** guohliu has joined #openstack-infra		02:45
*** sarob has joined #openstack-infra		02:46
*** SushilKM has joined #openstack-infra		02:57
*** yongli has joined #openstack-infra		02:58
*** yamahata_ has quit IRC		03:05
*** beagles has quit IRC		03:08
*** b3nt_pin has joined #openstack-infra		03:09
*** b3nt_pin is now known as beagles		03:09
*** mestery_ has joined #openstack-infra		03:11
*** mestery has quit IRC		03:14
*** sdake_ has joined #openstack-infra		03:16
*** pcrews has quit IRC		03:16
*** dkliban has quit IRC		03:18
mordred	jeblair: I support your crm114 efforts. that's really f-ing cool	03:22
clarkb	mordred: it is incredibly cool. I will owe jeblair lots of alcohol I bet	03:23
mordred	clarkb: ++	03:23
clarkb	mordred: one of the things I have been incredibly happy about the whole logstash elasticsearch thing is that it has enabled folks to hack on it in simple ways without needing too many crazy workarounds for eg logs behind apache	03:25
clarkb	I think that portion of the system has turned out well. It isn't all perfect though. A lot of the data could be modeled with relations and we don't have that	03:25
mordred	clarkb: yah. it's one of the coolest things ever	03:25
mordred	clarkb: I think it also goes to show the power to logging in sane ways	03:25
anteaya	mordred: what country are you in?	03:27
anteaya	good thing you aren't a micro manager, I haven't talked to you in a month	03:28
StevenK	Last I heard, it was .es, but that could have changed	03:28
mordred	anteaya: spain. flying out in a few hours	03:28
mordred	StevenK: I see you've already started playing everyone's favorite game "Where in the world is mordred?"	03:28
mordred	anteaya: that's right - would you like me to micro-manage more?	03:29
mordred	anteaya: go do things!	03:29
mordred	that's all I've got	03:29
clarkb	ha	03:29
clarkb	even when he tries he isn't able :)	03:29
*** dkliban has joined #openstack-infra		03:30
anteaya	mordred: great, whew thanks for that	03:30
anteaya	I feel better know	03:30
anteaya	now	03:30
mordred	clarkb: do - uhm - differnet things. perhaps scrumming something is a good choice?	03:30
mordred	clarkb: or kanban. definitely you should kanban something	03:30
clarkb	mordred: got it	03:30
mordred	phew	03:30
* mordred wins		03:30
clarkb	fungi: mordred wants us to put up a board with post its. do you have room in your lab?	03:31
* mordred has kanbanned his employees employing scrum methodology		03:31
clarkb	fungi: then we can build a robot to move things around for us	03:31
mordred	clarkb: only if the robot speaks japanese	03:31
*** AlexF has joined #openstack-infra		03:32
fungi	post-it robot, got it	03:33
fungi	tomorrow maybe	03:33
fungi	mordred: agile something something/	03:33
StevenK	Agile Robot-Driven Development ?	03:34
fungi	(...kill all humans...)	03:35
fungi	yes	03:36
clarkb	more evidence that everyone from NC is a robot	03:36
anteaya	well at least fungi is online a lot	03:36
anteaya	and mostly gives kind answers	03:37
anteaya	who am I do judge his robot internals	03:37
StevenK	Haha	03:37
*** changbl has joined #openstack-infra		03:37
*** ryanpetrello has joined #openstack-infra		03:37
fungi	in the south we say "rowbut"	03:37
clarkb	fungi: like zoidberg	03:37
StevenK	Zoidberg is more 'robbit'	03:37
anteaya	oh I like rowbut	03:38
fungi	newqular rowbuts	03:38
anteaya	robbit rabbit hobbit	03:38
anteaya	ha ha ha	03:38
*** mestery has joined #openstack-infra		03:39
*** ArxCruz has quit IRC		03:41
*** mestery_ has quit IRC		03:42
*** AlexF has quit IRC		03:45
*** jhesketh__ has quit IRC		03:52
*** mriedem has quit IRC		03:52
*** jhesketh__ has joined #openstack-infra		03:53
*** pabelanger_ has joined #openstack-infra		03:56
*** weshay has joined #openstack-infra		03:59
*** sarob has quit IRC		04:01
*** sarob has joined #openstack-infra		04:01
*** krtaylor has joined #openstack-infra		04:02
*** pabelanger_ has quit IRC		04:02
*** sarob has quit IRC		04:06
*** sarob has joined #openstack-infra		04:06
*** sarob has quit IRC		04:11
*** AaronGr_afk is now known as AaronGr		04:13
*** AaronGr is now known as AaronGr_Zzz		04:13
*** SushilKM has quit IRC		04:15
*** SushilKM has joined #openstack-infra		04:17
*** jcooley_ has joined #openstack-infra		04:17
*** SushilKM has quit IRC		04:20
*** SushilKM has joined #openstack-infra		04:21
*** sharwell has quit IRC		04:22
*** pabelanger_ has joined #openstack-infra		04:22
*** CaptTofu_ has quit IRC		04:25
*** SushilKM has quit IRC		04:25
*** CaptTofu has joined #openstack-infra		04:25
*** pabelanger__ has joined #openstack-infra		04:27
*** pabelanger__ has quit IRC		04:27
*** guohliu has quit IRC		04:29
*** CaptTofu has quit IRC		04:30
*** esker has joined #openstack-infra		04:34
*** dkliban has quit IRC		04:40
*** guohliu has joined #openstack-infra		04:42
*** dkliban has joined #openstack-infra		04:43
*** pabelanger_ has quit IRC		04:44
*** boris-42 has joined #openstack-infra		04:49
*** dkliban has quit IRC		05:00
*** sarob has joined #openstack-infra		05:05
openstackgerrit	Matthew Treinish proposed a change to openstack-infra/devstack-gate: Up the default concurrency on tempest runs https://review.openstack.org/58605	05:06
*** jcooley_ has quit IRC		05:06
*** sarob has quit IRC		05:09
*** sickboy3i has joined #openstack-infra		05:10
*** guohliu has quit IRC		05:11
*** sickboy3i has quit IRC		05:11
*** ryanpetrello has quit IRC		05:13
*** dstanek has quit IRC		05:13
*** jcooley_ has joined #openstack-infra		05:15
*** ryanpetrello has joined #openstack-infra		05:19
*** ljjjusti1 has joined #openstack-infra		05:20
*** weshay has quit IRC		05:21
*** ljjjustin has quit IRC		05:21
*** guohliu has joined #openstack-infra		05:22
*** SergeyLukjanov has joined #openstack-infra		05:27
*** dstanek has joined #openstack-infra		05:29
*** vkozhukalov has joined #openstack-infra		05:30
*** nicedice has quit IRC		05:35
*** reed has quit IRC		05:36
*** basha has joined #openstack-infra		05:38
basha	Hi, anyone around?	05:38
clarkb	basha: sort of, whats up?	05:38
basha	facing a small issue with a patch	05:38
basha	clarkb: https://review.openstack.org/#/c/60188/1	05:38
basha	The jenkins seems to pass.	05:39
basha	but when I look at the logs for python26/27	05:39
basha	it seems a lil weird	05:39
basha	clarkb: ^^	05:39
*** Abhishek has joined #openstack-infra		05:39
*** talluri has joined #openstack-infra		05:40
clarkb	basha: I see hte logs and exceptions	05:40
clarkb	but nose is reporting that the tests pass	05:40
basha	isnt it lil weird clarkb ? Does this happpen often?	05:41
basha	btw whats hte logs?	05:41
basha	:D	05:41
clarkb	basha: I have no idea, that would be questions for glance	05:41
zaro	clarkb: trying to use macbook, i'm sucking :(	05:41
clarkb	zaro: I'm sorry, I can't help you with the aluminum blocks	05:42
basha	clarkb: have u seen this happen before?	05:42
clarkb	basha: infra runs the tests, we aren't typically very good at answering questions about test weirdness	05:42
basha	clarkb: zaro : macs rock!! :P	05:42
*** sdake_ has quit IRC		05:42
clarkb	the tests themselves fall under the responsibility of the project and the project itself would be most familiar	05:43
*** dstanek has quit IRC		05:43
basha	clarkb: OK. I was just a lil puzzled that jenkins went green, but the logs seemed to be weird	05:43
zaro	basha:i'm newbie. shortcut keys don't work same on weechat.	05:43
clarkb	I would expect the exception at http://logs.openstack.org/88/60188/6/check/gate-glance-python27/bf13e3b/console.html#_2013-12-12_14_46_49_807 to cause the test to fail but nose doesn't agree with me	05:43
clarkb	zaro: is this a loaner?	05:43
zaro	clarkb: hopefully, but might be perm	05:44
*** Abhishek has quit IRC		05:44
zaro	clarkb: tara says she's gonna try to get same hp again but she says it's unlikely	05:44
basha	zaro: http://support.apple.com/kb/ht1343	05:44
clarkb	zaro: :(	05:44
basha	hope that helps :P	05:44
clarkb	basha: jenkins is just looking at the exit code of nose	05:45
basha	clarkb: I've seen that fail before.	05:45
zaro	http://support.apple.com/kb/ht1343	05:45
zaro	05:37:47 clarkb \| zaro: :(	05:45
clarkb	basha: if nose reports success jenkins reports success, and nose is clearly reporting success	05:45
zaro	gah!	05:45
basha	clarkb: I guess thats an ignored test perhaps	05:45
clarkb	basha: could also be a nose bug, nose is not the greatest test runner around	05:46
clarkb	or a test bug	05:46
basha	clarkb: I see migrations running, which I havnt seen in unit tests happening	05:46
clarkb	basha: I think the DB migration tests depend on having a mysql and or postgres server laying around configured properly	05:46
zaro	clarkb: my only other option was to get one of those bricks.	05:47
*** dstanek has joined #openstack-infra		05:47
basha	clarkb: hmmmmm…. l look into this in a bit more detail and let u know :)	05:47
clarkb	zaro: I would've gotten a brick :)	05:47
basha	clarkb: thanks a lot	05:47
zaro	basha: that did not help. trying to figure out why alt-j doesn't work in weechat.	05:48
zaro	clarkb: are you kidding me? that thing is like 10 lbs.	05:48
basha	zaro: I dont use weechat :D	05:48
basha	:P	05:48
clarkb	zaro: I wouldn't carry it anywhere	05:50
clarkb	but at least I would have a useable machine at my desk	05:50
clarkb	basha: http://logstash.openstack.org/#eyJzZWFyY2giOiIgYnVpbGRfbmFtZTpnYXRlLWdsYW5jZS1weXRob24yKiBBTkQgbWVzc2FnZTpcIk9wZXJhdGlvbmFsRXJyb3JcIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiODY0MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIiwic3RhbXAiOjEzODY5MTM4MTE2NTl9 looks like that exception	05:50
clarkb	happens quite a bit	05:50
*** yamahata_ has joined #openstack-infra		05:51
basha	clarkb: yeah. I've seen it break couple of times but the nose still passes.	05:52
zaro	clarkb: my broken laptop can work as a desktop.	05:53
*** jcooley_ has quit IRC		05:54
*** jcooley_ has joined #openstack-infra		05:54
clarkb	zaro: oh its just the display that is bad? I bet you could replace it	05:57
zaro	clarkb: you got any experience with that?	05:58
clarkb	sort of, you need to find a replacement display thta is compatible, then when you do the teardown document everything otherwise it doesn't go back together	05:59
*** jcooley_ has quit IRC		05:59
zaro	display is probably 80% of cost anyway.	06:00
clarkb	not really those laptops have crappy cheap displays	06:00
clarkb	the cpu and related peripherals are typically the costly bits	06:00
zaro	yeah, they do.	06:01
zaro	these things look pretty sealed. probably need special tools or something.	06:01
*** jcooley_ has joined #openstack-infra		06:02
zaro	can't even scrollback on this mac	06:03
*** dstanek has quit IRC		06:03
*** jcooley_ has quit IRC		06:04
*** jcooley_ has joined #openstack-infra		06:04
clarkb	https://www.laptopscreen.com/English/model/HP-Compaq/ELITEBOOK~FOLIO~9470M/ is the part	06:05
*** sarob has joined #openstack-infra		06:06
*** Abhishek_ has joined #openstack-infra		06:08
zaro	how come it looks so easy on that image? i don't even see the screes on the display.	06:08
clarkb	they always make it look easy :)	06:10
clarkb	"flex the inside edges of the bottom edge (1), the left and right sides (2), and the top edge (3) of the display bezel until the display bezel disengages from the display enclosure"	06:12
zaro	hey maybe it's the type of shell. mac default shell is xterm. what is it on ubuntu?	06:12
clarkb	zaro: you were using konsole which probably presents itself as an xterm	06:12
clarkb	swapping the display doesn't look too bad if you can pop the bezel off	06:13
zaro	dang it! page up on mac scrolls the screen, not the backsrcroll	06:13
zaro	pretty good pxd/hous game on tnt	06:14
zaro	your right about aldridge, he da man.	06:16
clarkb	and batum and lillard and matthews	06:18
openstackgerrit	lifeless proposed a change to openstack-infra/reviewstats: Pin Sphinx. https://review.openstack.org/61921	06:19
openstackgerrit	lifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now. https://review.openstack.org/61900	06:19
*** jcooley_ has quit IRC		06:19
zaro	ok. i'm done mucking with this for tonight. good night.	06:20
clarkb	night	06:20
*** denis_makogon has joined #openstack-infra		06:20
*** vkozhukalov has quit IRC		06:21
*** slong_ has quit IRC		06:24
*** ryanpetrello has quit IRC		06:28
*** SushilKM has joined #openstack-infra		06:39
*** vogxn has joined #openstack-infra		06:41
*** basha has quit IRC		06:42
*** bingbu has quit IRC		06:50
*** sarob has quit IRC		06:54
*** basha has joined #openstack-infra		06:54
*** sarob has joined #openstack-infra		06:55
openstackgerrit	A change was merged to openstack/requirements: Add oslo.rootwrap to global requirements https://review.openstack.org/61738	06:59
*** sarob has quit IRC		06:59
*** NikitaKonovalov has joined #openstack-infra		07:03
*** basha has quit IRC		07:05
*** SergeyLukjanov is now known as _SergeyLukjanov		07:05
*** bingbu has joined #openstack-infra		07:06
*** _SergeyLukjanov has quit IRC		07:06
openstackgerrit	lifeless proposed a change to openstack-infra/reviewstats: Pin Sphinx. https://review.openstack.org/61921	07:14
openstackgerrit	lifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now. https://review.openstack.org/61900	07:14
*** SergeyLukjanov has joined #openstack-infra		07:19
*** sarob has joined #openstack-infra		07:25
*** yolanda has joined #openstack-infra		07:28
*** dstanek has joined #openstack-infra		07:30
*** basha has joined #openstack-infra		07:31
*** basha has quit IRC		07:31
*** rcarrillocruz has joined #openstack-infra		07:33
*** dstanek has quit IRC		07:35
*** senk has joined #openstack-infra		07:40
*** bingbu has quit IRC		07:41
*** SergeyLukjanov is now known as _SergeyLukjanov		07:44
*** _SergeyLukjanov has quit IRC		07:45
*** sergmelikyan has joined #openstack-infra		07:46
sergmelikyan	>>/msg chanserv access #murano add openstackinfra +AFRfiorstv	07:46
*** Abhishek_ has quit IRC		07:46
sergmelikyan	Why is bot require such privileges?	07:46
sergmelikyan	And are they required to merge https://review.openstack.org/61703?	07:48
*** andreaf has joined #openstack-infra		07:51
*** vkozhukalov has joined #openstack-infra		07:52
*** oubiwan__ has quit IRC		07:53
*** dizquierdo has joined #openstack-infra		07:54
*** jcoufal has joined #openstack-infra		07:55
*** sarob has quit IRC		07:57
*** vkozhukalov has quit IRC		08:02
openstackgerrit	A change was merged to openstack-infra/devstack-gate: Adding an option to use qpid instead of rabbit or zeromq https://review.openstack.org/55829	08:04
*** flaper87\|afk is now known as flaper87		08:06
*** vogxn1 has joined #openstack-infra		08:06
*** vogxn has quit IRC		08:08
*** praneshp has quit IRC		08:11
*** vogxn1 has quit IRC		08:11
*** SergeyLukjanov has joined #openstack-infra		08:12
*** bingbu has joined #openstack-infra		08:13
*** vkozhukalov has joined #openstack-infra		08:14
*** praneshp has joined #openstack-infra		08:16
*** basha has joined #openstack-infra		08:18
*** nprivalova has joined #openstack-infra		08:23
*** denis_makogon has quit IRC		08:25
*** rcarrillocruz1 has joined #openstack-infra		08:26
*** sarob has joined #openstack-infra		08:26
*** rcarrillocruz has quit IRC		08:28
*** praneshp has quit IRC		08:29
*** rongze has joined #openstack-infra		08:29
*** senk has quit IRC		08:30
*** xchu has quit IRC		08:31
*** sarob has quit IRC		08:34
*** iv_m has joined #openstack-infra		08:38
*** bingbu has quit IRC		08:38
*** salv-orlando has joined #openstack-infra		08:39
*** jpich has joined #openstack-infra		08:41
*** sHellUx has joined #openstack-infra		08:45
SergeyLukjanov	fungi, mordred, clarkb, jeblair, hey guys	08:45
SergeyLukjanov	Queue lengths: 245 events, 382 results	08:45
SergeyLukjanov	^^ in zuul, looks not very good	08:46
SergeyLukjanov	many of jobs are failing with https://jenkins02.openstack.org/job/gate-cinder-docs/3172/console	08:48
*** afazekas has joined #openstack-infra		08:48
*** yongli has quit IRC		08:48
*** dizquierdo has quit IRC		08:49
*** bingbu has joined #openstack-infra		08:51
*** nosnos has joined #openstack-infra		08:55
*** apevec has joined #openstack-infra		08:58
*** apevec has joined #openstack-infra		08:58
*** yassine has joined #openstack-infra		08:58
*** yassine has quit IRC		09:00
*** yassine has joined #openstack-infra		09:00
*** yassine has quit IRC		09:02
apevec	java.io.IOException: Remote call on precise14 failed - is that broken Jenkins slave ?	09:03
apevec	http://logs.openstack.org/32/61532/1/gate/gate-heat-python27/5d7c9dc/console.html	09:03
apevec	that failed reverification of 61532 which blocks Heat CVE fixes on stable/havana :(	09:04
*** yamahata_ has quit IRC		09:04
*** rongze has quit IRC		09:05
*** yassine has joined #openstack-infra		09:06
*** yassine has quit IRC		09:06
openstackgerrit	Ruslan Kamaldinov proposed a change to openstack-infra/config: Add jenkins03, jenkins04 to cacti https://review.openstack.org/61938	09:07
*** yassine has joined #openstack-infra		09:07
openstackgerrit	Abhishek Chanda proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1249889 https://review.openstack.org/61939	09:10
uvirtbot	Launchpad bug 1249889 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern[compute,image,volume] failed" [Undecided,Invalid] https://launchpad.net/bugs/1249889	09:10
*** kruskakli has left #openstack-infra		09:12
*** Abhishek_ has joined #openstack-infra		09:14
*** derekh has joined #openstack-infra		09:16
*** bingbu has quit IRC		09:17
*** SergeyLukjanov has quit IRC		09:22
*** sarob has joined #openstack-infra		09:26
*** jooools has joined #openstack-infra		09:34
*** rongze has joined #openstack-infra		09:34
*** sHellUx has quit IRC		09:45
*** hashar has joined #openstack-infra		09:48
*** zhiyan has joined #openstack-infra		09:48
*** rossella_s has joined #openstack-infra		09:49
*** nosnos has quit IRC		09:51
*** johnthetubaguy has joined #openstack-infra		09:56
*** sarob has quit IRC		09:58
*** saschpe_ has joined #openstack-infra		10:01
*** saschpe has quit IRC		10:02
*** ArxCruz has joined #openstack-infra		10:06
andreaf	hi - I'm working on a tempest change which has the following implication: listing servers requires tempest.conf to be available. gate-tempest-py27 is failing because tempest.conf is missing. Is it possible that the config file has not been generated yet when this check runs? I though devstack would create tempest.conf at setup. What am I missing?	10:07
*** masayukig has quit IRC		10:08
*** basha has quit IRC		10:08
*** apevec has quit IRC		10:09
*** SergeyLukjanov has joined #openstack-infra		10:10
*** jhesketh__ has quit IRC		10:12
openstackgerrit	Alexandre Levine proposed a change to openstack-infra/config: Adding empty gce-api project to stackforge https://review.openstack.org/61954	10:13
*** dizquierdo has joined #openstack-infra		10:16
*** nprivalova has quit IRC		10:16
*** apevec has joined #openstack-infra		10:21
*** apevec has joined #openstack-infra		10:22
openstackgerrit	Alexandre Levine proposed a change to openstack-infra/config: Adding empty gce-api project to stackforge https://review.openstack.org/61954	10:22
*** sarob has joined #openstack-infra		10:26
*** guohliu has quit IRC		10:29
apevec	ttx, thanks for filing bug 1260654 that slave seems really broken: https://jenkins02.openstack.org/computer/precise14/builds	10:31
uvirtbot	Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,New] https://launchpad.net/bugs/1260654	10:31
*** sarob has quit IRC		10:31
apevec	only gate-noop works (what does it do?)	10:31
*** ArxCruz has quit IRC		10:31
apevec	can that machine be removed from the pool?	10:31
*** dstanek has joined #openstack-infra		10:33
ttx	apevec: it can, but not by me	10:33
*** flaper87 is now known as flaper87\|afk		10:33
ttx	We don't have a good answer yet for borked slaves in european mornings	10:33
apevec	ok, then it will be Russian roulette in the gate	10:33
ttx	since the people with power to kill them are not up	10:33
ttx	mordred, fungi ^	10:34
apevec	license to kill	10:34
*** ljjjusti1 has quit IRC		10:35
*** dstanek has quit IRC		10:37
*** nprivalova has joined #openstack-infra		10:40
BobBall	we need a batphone	10:42
*** chandankumar has quit IRC		10:43
chmouel	ttx: we are trying to find resource here at eNovance who can help infra during european times	10:45
*** senk has joined #openstack-infra		10:45
*** chandankumar has joined #openstack-infra		10:46
*** senk has quit IRC		10:48
*** senk has joined #openstack-infra		10:49
openstackgerrit	Vadim Rovachev proposed a change to openstack-infra/devstack-gate: Added ceilometer-anotification to enabled services https://review.openstack.org/61958	10:53
*** senk has quit IRC		10:53
*** paul-- has quit IRC		10:56
*** sergmelikyan has quit IRC		11:00
*** paul-- has joined #openstack-infra		11:04
*** marun has joined #openstack-infra		11:05
apevec	more bad slaves, now precise20 https://jenkins02.openstack.org/job/gate-nova-python27/13176/console	11:05
apevec	Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.tools.ant.Location	11:06
apevec	looks like it lost some java packages??	11:06
*** rongze has quit IRC		11:06
*** lcestari has joined #openstack-infra		11:09
*** markmc has joined #openstack-infra		11:19
ttx	ew	11:21
openstackgerrit	Darragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Use yaml local tags to support including files https://review.openstack.org/48783	11:26
*** sarob has joined #openstack-infra		11:26
*** marun has quit IRC		11:29
*** rongze has joined #openstack-infra		11:38
*** katyafervent has quit IRC		11:45
*** afazekas has quit IRC		11:47
*** rongze has quit IRC		11:49
*** zhiyan has quit IRC		11:49
*** sandy__ has quit IRC		11:53
*** sandy__ has joined #openstack-infra		11:54
*** sandy__ has quit IRC		11:57
*** sarob has quit IRC		11:58
*** jcoufal has quit IRC		12:01
*** jcoufal has joined #openstack-infra		12:02
*** nprivalova has quit IRC		12:03
*** nprivalova has joined #openstack-infra		12:04
sdague	so was there a bug against openstack ci on the jenkins crash?	12:05
yassine	Hello all,	12:05
yassine	i got some issues with my patch https://review.openstack.org/#/c/60499 it looks like zookeeper package was not	12:05
yassine	successfully installed in some slaves despite this patch which add zookeeper in Puppet manifest https://review.openstack.org/#/c/60509 for jenkins slaves. Is it a known issue ? How could it be fixed ? :/	12:05
apevec	sdague, ttx filed bug 1260654 for one instance of NoClassDefFoundError	12:07
uvirtbot	Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,New] https://launchpad.net/bugs/1260654	12:07
sdague	apevec: I actually meant the reusing of the slaves	12:07
sdague	which caused all the jobs to fail	12:07
sdague	because right now we are sticking it on another bug	12:08
sdague	which wasn't really the story	12:08
apevec	oh, I don't know about "reusing of the slaves", what was that?	12:09
*** fifieldt has quit IRC		12:09
sdague	last night, it's why everything failed for a while	12:09
apevec	is missing classes on slave related or not?	12:10
sdague	not sure	12:11
*** ruhe has joined #openstack-infra		12:13
*** rongze has joined #openstack-infra		12:15
openstackgerrit	Sean Dague proposed a change to openstack-infra/elastic-recheck: add query for jenkins crash https://review.openstack.org/61974	12:16
*** ianw has quit IRC		12:18
openstackgerrit	A change was merged to openstack-infra/elastic-recheck: add query for jenkins crash https://review.openstack.org/61974	12:19
*** SergeyLukjanov is now known as _SergeyLukjanov		12:24
*** dstanek has joined #openstack-infra		12:24
*** _SergeyLukjanov has quit IRC		12:25
*** mfer has joined #openstack-infra		12:25
*** sarob has joined #openstack-infra		12:26
*** SergeyLukjanov has joined #openstack-infra		12:33
*** Abhishek_ has quit IRC		12:37
*** thomasem has joined #openstack-infra		12:48
yassine	could someone please answer my question :$	12:49
*** jcoufal has quit IRC		12:49
openstackgerrit	Cyril Roelandt proposed a change to openstack/requirements: HTTPretty: update to 0.7.1 https://review.openstack.org/61981	12:49
*** jcoufal has joined #openstack-infra		12:50
*** HenryG has quit IRC		12:51
*** dolphm has joined #openstack-infra		12:51
*** sandywalsh has joined #openstack-infra		12:53
*** dizquierdo has quit IRC		12:56
*** HenryG has joined #openstack-infra		12:58
*** sarob has quit IRC		12:59
*** yaguang has quit IRC		13:02
*** dkliban has joined #openstack-infra		13:02
*** marun has joined #openstack-infra		13:03
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Stories and Tasks search https://review.openstack.org/60515	13:05
*** dstanek has quit IRC		13:06
*** dstanek has joined #openstack-infra		13:10
*** oubiwan__ has joined #openstack-infra		13:15
*** sandywalsh has quit IRC		13:15
portante	jog0, clarkb, sdague, under what category to I file this bug:	13:17
portante	http://logs.openstack.org/87/61587/1/gate/gate-swift-pep8/3291350/console.html	13:17
portante		13:17
sdague	yeh, we definitely need someone in .eu to get skilled up on infra. Maybe we tell mordred he has to move to barcelona :)	13:21
ruhe	already answered in qa channel. for everyone else jenkins error is filed in https://bugs.launchpad.net/bugs/1260654	13:21
uvirtbot	Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,Confirmed]	13:21
sdague	ruhe: thanks	13:21
*** oubiwan__ has quit IRC		13:22
portante	sdague: can we get an elastic recheck for these kinds of infra bugs? they are likely to happen again in the future at some point	13:22
*** dizquierdo has joined #openstack-infra		13:23
ruhe	sdague, we (mirantis) do plan to dedicate couple of engineers to work on infra full-time, but it sure will take a lot of time to get accustomed to infra	13:23
sdague	portante: there is one, but right now e-r only looks for details in tempest/devstack jobs	13:24
dims	portante, its easy to submit a review against elastic-recheck, just need to add a yaml file - https://github.com/openstack-infra/elastic-recheck/tree/master/queries :)	13:24
sdague	it's a future enhancement to have it look at all the jobs	13:24
*** dcramer_ has joined #openstack-infra		13:25
sdague	see scroll back 16 lines where I added it to e-r	13:25
*** derekh has quit IRC		13:26
*** sarob has joined #openstack-infra		13:26
*** esker has quit IRC		13:27
dims	sdague, ah cool. i just started looking at gate queue and was expected 50+ and found just a few and was wondering when i saw this :)	13:27
sdague	dims: yeh, so when jog0 started the assumption was the only things that actually failed in the gate were races caused by a real cloud	13:27
sdague	i.e. there is no reason for docs and unit tests to fail in the gate, they should have passed in check	13:28
sdague	but external events can make them fail (as well as bad reviewers)	13:28
sdague	so they need to be added	13:28
dims	sdague, right	13:28
sdague	and that's part of the code which needs some more brutal refactoring to get there	13:29
portante	is we can tell these events, can we not make users do a recheck and just have infra retry the job?	13:29
sdague	portante: so the issue is we are skipping processing them on the elastic recheck side	13:29
sdague	because processing a job type requires actually knowing all the files that might need to have gotten to elastic search, as there is a delay	13:30
portante	yes, thanks, understood	13:30
sdague	portante: and, in general, we don't want to do auto recheck, because experience has shown that no one actually looks at the issues	13:31
*** dcramer_ has quit IRC		13:31
sdague	the point of e-r is to help us classify the "worst" races we are seeing and grouping them, so people can prioritize these	13:32
sdague	and get them fixed	13:32
portante	developer frustration with rechecks is still growing though, and we need to address that too.	13:33
sdague	portante: sure, and the way to fix that is to fix the underlying issues	13:33
sdague	because if we just autorechecked, all it would mean is the gate merge time would grow to over a day as everything crashes through, blows up, is automatically readded.	13:34
portante	sdague: certainly. though I am thinking that if a job failes because of infra issue, and it can be moved to another instance and retried, that seems like a worth-while investment	13:34
portante	sdague, I am not suggesting recheck the entire job	13:35
portante	just have the ci re-run the docs job on another instance when it detects that there is an infrastructure issue	13:35
sdague	portante: sure, there could be infra recovery for exactly this kind of issue. I'd like the ci team address that	13:35
portante	where do they live? #openstack-ci	13:35
sdague	here	13:35
sdague	ci/infra	13:36
sdague	but the core team is basically west coast US, plus fungi on the east US, so they aren't awake yet	13:36
*** oubiwan__ has joined #openstack-infra		13:37
*** paul-- has quit IRC		13:38
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Stories and Tasks search https://review.openstack.org/60515	13:40
portante	sdague: thanks	13:40
*** jcoufal has quit IRC		13:41
openstackgerrit	Nikita Konovalov proposed a change to openstack-infra/storyboard: Added basic popup messages https://review.openstack.org/59706	13:41
*** dhellmann_ is now known as dhellmann		13:42
*** dolphm has quit IRC		13:43
*** weshay has joined #openstack-infra		13:48
*** dkliban has quit IRC		13:50
*** dolphm has joined #openstack-infra		13:51
*** dolphm_ has joined #openstack-infra		13:52
*** dprince has joined #openstack-infra		13:55
*** dolphm has quit IRC		13:56
*** bpokorny has joined #openstack-infra		13:57
*** sarob has quit IRC		13:58
*** rongze has quit IRC		13:59
*** oubiwan__ has quit IRC		14:02
openstackgerrit	Cyril Roelandt proposed a change to openstack/requirements: HTTPretty: update to 0.7.1 https://review.openstack.org/61981	14:07
*** jpich has quit IRC		14:07
*** mriedem has joined #openstack-infra		14:08
sdague	ttx: I have a new hack idea, if you want to try it with your email thing	14:09
sdague	any time a bug gets too big to modify via the web, add launchpad as an affected project	14:09
sdague	with a comment that launchpad is getting added because we can no longer modify this bug in launchpad	14:10
ttx	my email thing is not magic, just applying https://help.launchpad.net/Bugs/EmailInterface	14:10
sdague	I'm actually super annoyed that I've got 2 bugs in the tempest queue that are dead wood	14:10
ttx	(just need your PGP publickey registered with LP)	14:10
sdague	#1179008 rename requires files to standard names	14:10
ttx	sdague: maybe the other one is not as blocked	14:10
sdague	#1214176 Fix copyright headers to be compliant with Foundation policies	14:11
ttx	let me try that second one	14:11
sdague	could you get the LP team to just delete those bugs entirely	14:11
*** ruhe is now known as ruhe_		14:12
ttx	bah, submit request failure	14:12
ttx	sdague: they usually reply to launchpad questions. Let me try that	14:13
sdague	I think we should just delete any bug that's gotten out of control, because it just causes problems with projects that show up late and try to fix it	14:14
*** oubiwan__ has joined #openstack-infra		14:15
*** vkozhukalov has quit IRC		14:15
fungi	what's the urgent machine to remove?	14:16
sdague	fungi: one sec	14:16
*** jcoufal has joined #openstack-infra		14:17
*** blamar has quit IRC		14:17
*** ruhe_ has quit IRC		14:18
sdague	fungi: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiamF2YS5pby5JT0V4Y2VwdGlvblwiICAgQU5EIG1lc3NhZ2U6XCJSZW1vdGUgY2FsbCBvblwiICAgQU5EIG1lc3NhZ2U6XCJmYWlsZWRcIiAgIEFORCBmaWxlbmFtZTpcImNvbnNvbGUuaHRtbFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiJhbGwiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzg2OTQ0Mjg0Nzg3fQ==	14:18
sdague	precise 14 and precise 20 it seems	14:18
ttx	sdague: let's see how that goes: https://answers.launchpad.net/launchpad/+question/240748	14:19
fungi	sdague: i'll dampen them	14:19
*** SushilKM has quit IRC		14:20
fungi	it's possible something went weird with the slave agent connection to them when we rebooted jenkins02	14:20
*** eharney has joined #openstack-infra		14:22
ttx	fungi: let us know, we'll restart the stable/* jobs afterwards	14:24
fungi	they're already offline as of a minute or so	14:25
ttx	fungi: ok, retrying then.	14:25
fungi	https://jenkins02.openstack.org/computer/precise14/ and https://jenkins02.openstack.org/computer/precise20/	14:25
*** russellb is now known as rustlebee		14:25
fungi	i'll work some magic to get them back into service	14:25
ttx	fungi: what is the appropriate keyword to reverify in that case ?	14:26
*** sarob has joined #openstack-infra		14:26
ttx	I can abuse bug 1260654	14:26
uvirtbot	Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,Confirmed] https://launchpad.net/bugs/1260654	14:26
*** flaper87\|afk is now known as flaper87		14:26
fungi	ttx: you can just reapprove them instead of using reverify if you're core for that (which you are), or if we have a bug open on this already then you could reverify against that bug	14:26
fungi	that works	14:27
apevec	ttx, why would that be abuse?	14:27
ttx	apevec: it may or may not match exactly that error :)	14:27
fungi	for the record, slave agent failures look likely... https://jenkins02.openstack.org/job/gate-nova-python27/13301/consoleText https://jenkins02.openstack.org/job/gate-neutron-docs/3708/consoleText	14:27
*** dansmith is now known as damnsmith		14:27
*** oubiwan__ has quit IRC		14:28
apevec	yeah, some had multiple failures, I've sent 2013.2.1 update email to you specifying what failed where	14:28
fungi	precise14 seemed to be dying straight away, but precise20 was getting through the job and then bailing on artifact collection	14:28
ttx	apevec: horizon/heat requirements sync reverified	14:28
apevec	tahnks	14:28
*** yamahata_ has joined #openstack-infra		14:29
*** prad has joined #openstack-infra		14:32
*** ilyashakhat has joined #openstack-infra		14:35
*** ilyashakhat has quit IRC		14:36
*** ruhe has joined #openstack-infra		14:36
*** bknudson has joined #openstack-infra		14:36
*** sarob has quit IRC		14:37
*** andreaf has quit IRC		14:38
*** saper_ is now known as saper		14:38
fungi	precise14 and 20 rebooted and back in service, watching to make sure jobs complete on them now	14:39
fungi	this ran to completion on precise14... https://jenkins02.openstack.org/job/gate-puppet-neutron-puppet-syntax/83/console	14:39
fungi	and this on precise20... https://jenkins02.openstack.org/job/gate-puppet-neutron-puppet-unit-2.7/121/console	14:40
fungi	should be sane now	14:41
*** Abhishe__ has joined #openstack-infra		14:41
apevec	fungi, so what was it?	14:41
*** smarcet has joined #openstack-infra		14:41
apevec	dolphm_, please approve https://review.openstack.org/61425	14:41
fungi	java exceptions when the master was trying to communicate with the slave agent. there's every chance they lost their sanity during the reboot of jenkins02 last night	14:42
*** dstanek has quit IRC		14:42
fungi	well, s/reboot/restart/	14:42
openstackgerrit	David Kranz proposed a change to openstack-infra/devstack-gate: Always dump errors to console https://review.openstack.org/61850	14:43
*** xchu has joined #openstack-infra		14:43
fungi	jenkins01 shot itself in the head last night over a jvm oom condition so we had to scarmble to get everything back up and running after that, and noticed jenkins02 was using at least as much memory as 01 had been so we did a controlled restart of jenkins on it as well	14:43
*** xchu has quit IRC		14:43
*** dstanek has joined #openstack-infra		14:43
sdague	fungi: yeh, seems like it would be nice to have something that auto downs these nodes on a jenkins stack trace capture	14:44
fungi	but in the process, i'm betting something happened with slave agent communication to precise14 and 20 as it booted back up	14:44
*** rongze has joined #openstack-infra		14:44
sdague	this seems to happen every 3 weeks or so, and basically kills a whole dev cycle for .eu	14:44
fungi	sdague: i wonder whether there's a jenkins plugin for that	14:44
*** xchu has joined #openstack-infra		14:44
*** rnirmal has joined #openstack-infra		14:44
*** HenryG_ has joined #openstack-infra		14:45
ruhe	fungi, sdague: i guess monitoring system might be enough to prevent such events	14:45
fungi	sdague: but regardless, we're already in progress shifting jobs to single-use slaves, which is our preferred near-term solution to this (as opposed to the longer-term "get rid of jenkins entirely" solution)	14:45
*** jd__ has quit IRC		14:46
*** iv_m has quit IRC		14:46
*** jd__ has joined #openstack-infra		14:46
*** iv_m has joined #openstack-infra		14:46
*** hughsaunders has quit IRC		14:46
fungi	ruhe: interestingly, probably not. there's was nothing outwardly unusual about the condition of those slaves. we'd need to interrogate the jenkins master and have it perform some sort of communication and artifact collection tests as a canary	14:46
*** hughsaunders has joined #openstack-infra		14:46
*** xchu has quit IRC		14:46
fungi	nontrivial	14:46
fungi	probably special jobs which would need to be run between normal jobs to detect a condition like that	14:47
*** xchu has joined #openstack-infra		14:47
*** xchu has quit IRC		14:47
*** blamar has joined #openstack-infra		14:48
fungi	sdague: at the moment, there are already a handful of infra jobs we've shifted from long-running slaves to bare (non-devstack) single-use slaves, with great success. it's just a matter of slowly shifting the remainder	14:48
*** xchu has joined #openstack-infra		14:48
* fungi will brb		14:48
*** HenryG has quit IRC		14:49
*** dkliban has joined #openstack-infra		14:50
*** andreaf has joined #openstack-infra		14:58
*** jcooley_ has joined #openstack-infra		15:00
*** markmcclain has joined #openstack-infra		15:02
*** esker has joined #openstack-infra		15:05
dolphm_	apevec: done	15:06
apevec	thanks!	15:06
*** rcleere has joined #openstack-infra		15:09
*** pabelanger has joined #openstack-infra		15:09
*** rongze has quit IRC		15:11
*** jasond has joined #openstack-infra		15:12
*** dcramer_ has joined #openstack-infra		15:14
*** basha has joined #openstack-infra		15:16
*** ryanpetrello has joined #openstack-infra		15:18
*** markmcclain has quit IRC		15:19
*** apevec has quit IRC		15:21
*** alcabrera has joined #openstack-infra		15:23
*** datsun180b has joined #openstack-infra		15:26
*** sarob has joined #openstack-infra		15:26
*** oubiwan__ has joined #openstack-infra		15:29
*** markmcclain has joined #openstack-infra		15:30
*** dolphm_ has quit IRC		15:30
*** jcoufal has quit IRC		15:30
*** zehicle_at_dell has joined #openstack-infra		15:31
*** rwsu has joined #openstack-infra		15:31
*** rcarrillocruz has joined #openstack-infra		15:33
mriedem	just opened this against infra, not sure if it's a known issue yet or not:	15:33
mriedem	https://bugs.launchpad.net/openstack-ci/+bug/1260767	15:33
uvirtbot	Launchpad bug 1260767 in openstack-ci "gate-nova-docs fails on master with "Remote call on precise14 failed"" [Undecided,New]	15:33
*** rcarrillocruz1 has quit IRC		15:35
*** xchu has quit IRC		15:36
portante	mreidem: saw that earlier	15:37
*** jcooley_ has quit IRC		15:37
*** SushilKM has joined #openstack-infra		15:37
portante	I think 1260654	15:38
*** jcooley_ has joined #openstack-infra		15:38
portante	https://bugs.launchpad.net/openstack-ci/+bug/1260654	15:38
*** dizquierdo has quit IRC		15:38
uvirtbot	Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Critical,Fix released]	15:38
portante	sdague filed that, I think	15:38
*** rongze has joined #openstack-infra		15:39
sdague	portante actually ttx	15:42
sdague	but you are right, it's a dup	15:42
*** iv_m has quit IRC		15:43
* ttx admits only having checked the rechecks page		15:43
*** jcooley_ has quit IRC		15:43
*** marun has quit IRC		15:43
mriedem	thanks guys	15:43
*** bnemec is now known as beekneemech		15:45
*** dims has quit IRC		15:45
*** dims has joined #openstack-infra		15:47
*** zehicle_at_dell has quit IRC		15:51
fungi	yassine: i've checked out unit-test slaves and it doesn't look like the centos6 slaves have a zookeeper-server installed. in fact, it doesn't appear that centos 6.4 provides an rpm for any package named zookeeper-server in its standard yum package repositories	15:52
*** NikitaKonovalov has quit IRC		15:53
fungi	yassine: the corresponding "zookeeper" package is installed on our ubuntu precise slaves however (both our python 2.7 and python 3.3 slave variants)	15:53
*** mriedem has quit IRC		15:56
*** maurosr has quit IRC		15:56
fungi	yassine: http://paste.openstack.org/show/54962/	15:57
*** mriedem has joined #openstack-infra		15:57
*** sarob has quit IRC		15:58
*** mfer has quit IRC		15:58
*** maurosr has joined #openstack-infra		15:59
*** rossella_s has quit IRC		16:00
*** mdenny has joined #openstack-infra		16:01
*** rnirmal_ has joined #openstack-infra		16:02
*** rnirmal has quit IRC		16:02
*** rnirmal_ is now known as rnirmal		16:02
*** mfer has joined #openstack-infra		16:03
sdague	fungi: so https://bugs.launchpad.net/tempest/+bug/1260710	16:03
openstackgerrit	A change was merged to openstack-infra/release-tools: Add mpcut.sh for milestone-proposed branch cutting https://review.openstack.org/61389	16:03
uvirtbot	Launchpad bug 1260710 in tempest "testr lists both tests and unit tests in gate-tempest-python27 job" [High,In progress]	16:03
sdague	it marked it as in progress, but didn't post the review	16:03
*** talluri has quit IRC		16:04
sdague	which I find highly annoying	16:04
sdague	and that seems to be the norm now	16:04
*** talluri has joined #openstack-infra		16:04
sdague	is that intended, or a bug?	16:04
jasond	is "reverify no bug" okay to use? if not, how would i go about identifying the bug to reverify?	16:05
fungi	sdague: what's the corresponding review for that bug? it doesn't seem to be the one mentioned in the bug description	16:06
sdague	fungi: https://review.openstack.org/#/c/62019/	16:06
fungi	sdague: https://review.openstack.org/#/c/62019/1..2//COMMIT_MSG	16:07
fungi	that's why	16:07
sdague	oh, right mtreinish failed on commit message	16:07
mtreinish	sdague: did I have a period?	16:08
sdague	mtreinish: Fixes bug doesn't link	16:08
fungi	update_bug.py is not smart enough to know whether or not its posted the review link, so it errs on the side of not spamming people on every patchset with another bug comment and just does it if it's patchset #1	16:08
sdague	Closes-Bug: #...............	16:08
fungi	the diff there shows that he added the bug header on comment #2, which is one reason	16:09
fungi	er, on patchset #2	16:09
mtreinish	sdague: sigh... ok I'll respin it	16:09
*** basha has quit IRC		16:09
*** jcooley_ has joined #openstack-infra		16:09
mtreinish	fungi: will that be enough?	16:09
openstackgerrit	Ben Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering https://review.openstack.org/54403	16:09
jeblair	i'm in favor of tightening the gerrit regex so it matches in the webui	16:09
openstackgerrit	Sahid Orentino Ferdjaoui proposed a change to openstack/requirements: Tox fails to build environment because of MySQL-Python version https://review.openstack.org/62027	16:10
sdague	yeh, it would be more obvious if the behavior was the same on both	16:10
fungi	using a standardized bug header will also help (so that it will also close the bug) but the reason it set in-progress and didn't comment with the link in the bug is that you didn't have a bug header on the initial patchset	16:10
sdague	fungi: sure	16:11
fungi	jeblair: i think we tightened the regex in gerrit as much as we could without losing links on review comments like "recheck bug 12345"	16:11
uvirtbot	Launchpad bug 12345 in isdnutils "isdn does not work, fritz avm (pnp?)" [Medium,Fix released] https://launchpad.net/bugs/12345	16:11
sdague	but if it gets fixed now will it post?	16:11
*** jasond has left #openstack-infra		16:11
*** Ryan_Lane has joined #openstack-infra		16:12
openstackgerrit	Sahid Orentino Ferdjaoui proposed a change to openstack/requirements: Tox fails to build environment because of MySQL-Python version https://review.openstack.org/62028	16:12
*** talluri has quit IRC		16:13
fungi	sdague: i can't remember if update_bug.py will set it to fix committed/released on a string like "Fixes bug 1260710" though i'm pretty sure "Fixes-bug: 1260710" works (even though closes is the recommended term in the wiki)	16:13
uvirtbot	Launchpad bug 1260710 in tempest "testr lists both tests and unit tests in gate-tempest-python27 job" [High,In progress] https://launchpad.net/bugs/1260710	16:13
fungi	the goal being to drive contributors toward using standard git header formats for these so they can be more easily mined from commit logs in the future	16:14
*** rongze has quit IRC		16:14
fungi	oh, also it should have been in the final paragraph of the commit message to be a proper header	16:14
fungi	that extra blank line makes it not	16:14
fungi	mtreinish: ^	16:15
*** jasond has joined #openstack-infra		16:15
jasond	does anybody know why this review says "Need Verified"? https://review.openstack.org/#/c/59851/	16:16
mtreinish	fungi: seriously	16:17
mtreinish	do I need to do another revision?	16:17
fungi	mtreinish: nope--just pointing out if you're trying to correct the commit message, that's part of it	16:17
fungi	jasond: taken care of	16:18
*** AaronGr_Zzz is now known as AaronGr		16:18
jasond	fungi: thanks!	16:18
*** ilyashakhat_ has quit IRC		16:20
*** rongze has joined #openstack-infra		16:21
*** jcooley_ has quit IRC		16:21
*** jcooley_ has joined #openstack-infra		16:21
*** zehicle_at_dell has joined #openstack-infra		16:22
*** AaronGr is now known as AaronGr_afk		16:22
*** jcooley_ has quit IRC		16:25
*** sarob has joined #openstack-infra		16:26
*** hashar has quit IRC		16:27
*** sarob has quit IRC		16:31
*** zehicle has joined #openstack-infra		16:32
yassine	fungi: thank you for the information !! Do you know how could i fix this issue ? :/	16:33
*** zehicle_at_dell has quit IRC		16:33
*** johnthetubaguy has quit IRC		16:33
*** saschpe_ has quit IRC		16:33
*** johnthetubaguy1 has joined #openstack-infra		16:33
*** niska has quit IRC		16:34
*** mrodden1 has quit IRC		16:34
*** saschpe has joined #openstack-infra		16:34
fungi	yassine: i left a review comment on the change you linked, but in short unless you can get a zookeeper-server rpm into centos 6 main repositories or fedora epel such that we can yum install it on the test slaves, your other option for python 2.6 unit testing right now would be figuring out whether it can be installed and used locally in the jenkins user's home directory by your unit test job without	16:35
fungi	needing root permissions on the system	16:35
fungi	i'm not familiar enough with what zookeeper is or how it works to know whether that's possible	16:35
*** ^d has joined #openstack-infra		16:35
*** dkliban has quit IRC		16:37
*** hughsaunders has quit IRC		16:37
*** prad has quit IRC		16:37
*** yamahata_ has quit IRC		16:37
*** changbl has quit IRC		16:37
*** openstackgerrit has quit IRC		16:37
*** Ghe_HPDiscover has quit IRC		16:37
*** juice has quit IRC		16:37
*** tian has quit IRC		16:37
*** iccha has quit IRC		16:37
*** Alex_Gaynor has quit IRC		16:37
*** jasond has quit IRC		16:37
*** hughsaunders_ has joined #openstack-infra		16:37
*** changbl_ has joined #openstack-infra		16:37
*** hughsaunders_ is now known as hughsaunders		16:38
*** nicedice has joined #openstack-infra		16:38
*** tian has joined #openstack-infra		16:38
*** jasond has joined #openstack-infra		16:38
*** dkliban has joined #openstack-infra		16:38
*** prad has joined #openstack-infra		16:38
*** yamahata_ has joined #openstack-infra		16:38
*** openstackgerrit has joined #openstack-infra		16:38
*** Ghe_HPDiscover has joined #openstack-infra		16:38
*** juice has joined #openstack-infra		16:38
*** iccha has joined #openstack-infra		16:38
*** Alex_Gaynor has joined #openstack-infra		16:38
*** niska has joined #openstack-infra		16:38
*** SushilKM has quit IRC		16:40
*** prad_ has joined #openstack-infra		16:42
*** StevenK_ has joined #openstack-infra		16:42
*** johnthetubaguy1 has quit IRC		16:43
*** SergeyLukjanov has quit IRC		16:43
*** lcestari has quit IRC		16:44
*** iccha_ has joined #openstack-infra		16:44
*** lcestari has joined #openstack-infra		16:44
*** StevenK has quit IRC		16:44
*** guitarzan has quit IRC		16:44
yassine	fungi: oh thanks! If i can wget the zookeeper tar then i can run the zookeeper server without root permissions, is it possible to wget from the slave ?	16:44
*** Ghe_HPDi1cover has joined #openstack-infra		16:45
*** jasond` has joined #openstack-infra		16:45
dhellmann	mordred: responding to your query from monday, no I don't see a 1.2 release of oslo.messaging. Did you already talk to markmc about it?	16:45
dhellmann	clarkb, sdague: responding to your comment from monday about overloading the branch-designator for wsme/pecan gate jobs, I'm not sure what that means. :-(	16:46
*** zehicle has quit IRC		16:46
ruhe	is puppet-dashboard.openstack.org supposed to render some html on port 3000?	16:47
*** johnthetubaguy has joined #openstack-infra		16:47
* dhellmann is happy for irc client history, but needs a better tool for dealing with irc while traveling		16:47
fungi	yassine: yes, that's fine, just be aware that downloads from the internet sometimes fail, especially if it's a large file, so it could cause your job to occasionally return a false negative result	16:47
*** pcrews has joined #openstack-infra		16:47
*** johnthetubaguy has quit IRC		16:48
*** guitarzan has joined #openstack-infra		16:48
*** johnthetubaguy has joined #openstack-infra		16:48
*** juice- has joined #openstack-infra		16:48
fungi	ruhe: it would if it were working, but it broke. there's a project underway to replace it with something called puppetboard (anteaya, Hunner and pleia2 are collaborating on it last i heard)	16:48
ruhe	fungi: got it, thanks	16:49
*** tian has quit IRC		16:49
*** dkliban has quit IRC		16:49
*** prad has quit IRC		16:49
*** yamahata_ has quit IRC		16:49
*** openstackgerrit has quit IRC		16:49
*** Ghe_HPDiscover has quit IRC		16:49
*** juice has quit IRC		16:49
*** iccha has quit IRC		16:49
*** Alex_Gaynor has quit IRC		16:49
*** jasond has quit IRC		16:49
*** juice- is now known as juice		16:49
*** prad_ is now known as prad		16:49
fungi	ruhe: puppet dashboard is unfortunately somewhat fragile, and was further complicated by boundlessly growing its mysql db until we couldn't effectively clean or resize it, so we eventually stopped trying while the replacement project is underway	16:50
fungi	(there are simply too few of us to limp too many broken systems along indefinitely)	16:51
*** esker has quit IRC		16:51
*** SushilKM has joined #openstack-infra		16:51
ruhe	let's hope puppetboard doesn't have this issues. i'll try to install it in my infra copy and see how it goes	16:52
fungi	ruhe: it sounds like it will work out much better. lighter weight and actually supported (puppet dashboard was effectively dead upstream, we were running somewhat of a fork, since puppetlabs had moved to recommending their proprietary dashboard instead)	16:53
fungi	ruhe: however it needs puppetdb, which we hadn't previously been using, so i think they're working on getting a manifest together to install that along with puppetboard	16:53
*** tian has joined #openstack-infra		16:54
*** yamahata_ has joined #openstack-infra		16:54
*** Alex_Gaynor has joined #openstack-infra		16:54
fungi	the alternative we'd explored was switching to the sodabrew fork of puppet dashboard, since its upstream was also somewhat active still	16:55
yassine	fungi: perfect ! i will wget then, it will simplify my script :) thank you for your help i really appreciate	16:55
*** jcooley_ has joined #openstack-infra		16:55
*** danger_fo_away is now known as dangers		16:55
fungi	yassine: my pleasure--let me know if you have any other questions	16:55
yassine	sure :)	16:55
* fungi needs to disappear again for a moment, and will return shortly		16:55
sdague	do we have a bug bot anywhere?	16:56
*** dkliban has joined #openstack-infra		16:56
sdague	I'd really like to get IRC message on new bugs	16:56
sdague	for tempest, so we can basically keep new bugs down to 0	16:56
*** mrodden has joined #openstack-infra		16:57
clarkb	sdague: soren has one. it subscribes to bugs and alerts on imap entries	16:58
clarkb	dhellman: basically in that designator you put a string saying this is a wsme/pecan job	16:59
dhellmann	clarkb: does that go in the job definition in one of the yaml files?	16:59
clarkb	everything else about the job matches the openstack gate so you stay in sync without mutual gating	16:59
markmc	dhellmann, I didn't do a 1.2 release of oslo.messaging	16:59
clarkb	dhellman: in projects.yaml where you instantiate the template	16:59
dhellmann	markmc: I don't see any releases on pypi, should we do one?	17:00
markmc	dhellmann, no, it wasn't in havana - first release will be in icehouse, and that will be 1.3	17:00
markmc	dhellmann, was going to do 1.2 when it looked like it was going to be in havana	17:01
markmc	dhellmann, IOW, there's still room for some API changes	17:01
dhellmann	markmc: why 1.3 if there is not yet a 1.2? I feel like we've had this conversation...	17:01
dhellmann	ah	17:01
markmc	dhellmann, would like them to be minor at this point yet	17:01
markmc	dhellmann, matching oslo.config, for no great reason	17:01
*** markmcclain has quit IRC		17:01
dhellmann	markmc: ok, I didn't think we were worried about matching release versions across libraries like that, but we can talk about it	17:02
dhellmann	clarkb: what does the branch-designator buy us? a separate gate queue? so pecan gate jobs don't clog up the openstack gate?	17:02
clarkb	dhellmann: correct that plus staying in sync with the openstack gate	17:03
Hunner	Guh. Still haven't done any puppetboard stuff... It's hard to do stuff so close to work, I think >_<	17:03
*** dkliban has quit IRC		17:04
clarkb	rather than two different templates that can diverge there is one template that can create jobs with arbitrary names	17:04
*** tma996 has joined #openstack-infra		17:04
dhellmann	clarkb, so I would add a "devstack-jobs" entry to the jobs list for pecan with pipeline=gate and branch-designator=pecan-wsme or something like that?	17:06
dhellmann	clarkb: or maybe that pipeline should be different, too?	17:06
*** jooools has quit IRC		17:06
*** UtahDave has joined #openstack-infra		17:07
*** SushilKM has quit IRC		17:08
*** talluri has joined #openstack-infra		17:08
*** nprivalova_ has joined #openstack-infra		17:11
clarkb	dhellman: no thst sounds fine. you may not want all of devstack-jobs though	17:12
*** dstanek_afk has joined #openstack-infra		17:12
*** dstanek has quit IRC		17:13
*** nprivalova has quit IRC		17:13
*** nprivalova_ is now known as nprivalova		17:13
*** dstanek_afk is now known as dstanek		17:13
dhellmann	clarkb: yeah, we'll look at the list and verify before including all of them	17:16
dhellmann	clarkb: thanks for the tips	17:17
*** ruhe has quit IRC		17:19
jeblair	#status log restarted gerritbot	17:21
*** openstackgerrit has joined #openstack-infra		17:22
*** SergeyLukjanov has joined #openstack-infra		17:23
*** zehicle_at_dell has joined #openstack-infra		17:23
*** Alex_Gaynor has quit IRC		17:26
*** Alex_Gaynor has joined #openstack-infra		17:26
mriedem	are jobs timing out at all right now?	17:26
mriedem	http://logs.openstack.org/52/55752/14/check/check-tempest-dsvm-full/3eb1378/console.html	17:26
mriedem	http://logs.openstack.org/52/55752/14/check/check-tempest-dsvm-full/3eb1378/console.html#_2013-12-13_16_53_45_134	17:27
*** SushilKM has joined #openstack-infra		17:28
jeblair	mriedem: http://status.openstack.org/zuul/ sasy many check jobs have succeeded recently	17:30
mriedem	so hiccup?	17:30
jeblair	mriedem: no, i believe there is a current nondeterministic bug that causes jobs to run very long and time out	17:31
mriedem	jeblair: i opened https://bugs.launchpad.net/openstack-ci/+bug/1260816 to recheck against	17:31
uvirtbot	Launchpad bug 1260816 in openstack-ci "check-tempest-dsvm-full job timed out causing build failure" [Undecided,New]	17:31
*** yaguang has joined #openstack-infra		17:31
*** dolphm has joined #openstack-infra		17:32
jeblair	mriedem: https://bugs.launchpad.net/tempest/+bug/1258682	17:34
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid]	17:34
jeblair	mriedem: i will mark your bug as a dup of that	17:34
mriedem	jeblair: ah, thanks, maybe i didn't find it 'Build timed out' in LP, at least not in openstack-ci where i was looking	17:34
jeblair	mriedem: i also tagged it with 'gate-failure' which we've recently started doing to try to make these easier to find	17:35
jeblair	mriedem: [i understand, you see how long it took me to find it :( ]	17:35
*** freyes has joined #openstack-infra		17:36
fungi	jeblair: so i suppose gerritbot didn't return after one of the more recent netsplits? (i saw it in and out a few times on earlier splits today already)	17:37
*** markmcclain has joined #openstack-infra		17:38
*** freyes has quit IRC		17:41
*** johnthetubaguy has quit IRC		17:43
*** SushilKM has quit IRC		17:43
*** reed has joined #openstack-infra		17:43
*** johnthetubaguy has joined #openstack-infra		17:43
*** reed has quit IRC		17:43
*** reed has joined #openstack-infra		17:43
jeblair	fungi: possibly; it seemed to be running	17:45
*** basha has joined #openstack-infra		17:45
*** sandywalsh has joined #openstack-infra		17:45
*** ruhe has joined #openstack-infra		17:46
notmyname	jog0: I put my gate status code and url-generating script online https://github.com/notmyname/gate_status	17:48
*** SergeyLukjanov_ has joined #openstack-infra		17:48
*** AaronGr_afk is now known as AaronGr		17:48
*** SergeyLukjanov has quit IRC		17:48
*** dolphm has quit IRC		17:49
*** dkliban has joined #openstack-infra		17:50
*** tma996 has quit IRC		17:51
jeblair	notmyname: fyi there's a jquery plugin to build graphite urls; see it in action at the bottom of view-source:http://status.openstack.org/zuul/index.html	17:52
clarkb	fungi: I am going to upgrade jenkins on jenkins-dev to 1.543 now	17:53
jeblair	jog0's graph uses it too	17:53
notmyname	jeblair: cool. (but that would mean javascript and then I'd have to add "front end design" to my linkedin page and then I'd get more recruiter spam and ...)	17:53
jeblair	notmyname: definitely not worth it :)	17:53
notmyname	hehe	17:53
*** dolphm has joined #openstack-infra		17:53
fungi	clarkb: awesome	17:54
*** sdake_ has joined #openstack-infra		17:54
notmyname	jeblair: 12 hour buckets, over the last 11 days (that's how long you keep data?) http://not.mn/gate_status.html	17:54
jeblair	clarkb: do you have a script to submit a simulated job completion event to log-gearman-worker?	17:54
clarkb	jeblair: I don't, the worker doesn't receive job completion events	17:55
jeblair	notmyname: it's been 11 days since we renamed the jobs (and when we renamed them, we did not move the graphite data)	17:55
zaro	good morning	17:55
notmyname	jeblair: ah, gotcha	17:56
yaguang	help needed, change to requirements stable/grizzly jenkins gate fails	17:56
jeblair	clarkb: i know, it's complicated. there are several places where you could inject an artificial event for testing; i'm assuming you have no scripts that inject events into any such places? :)	17:56
jeblair	notmyname: otherwise we do keep data for a year	17:57
yaguang	for this patch https://review.openstack.org/#/c/61237/	17:57
clarkb	jeblair: not really no, I typically just run the client and worker locally and hook them up to a jenkins feed. jenkins is busy enough to get events that way :)	17:57
notmyname	jeblair: are there any events generated when the zuul pipeline gets reset? I'd _really_ like to track that number	17:57
jeblair	clarkb: is jenkins zmq public?	17:57
clarkb	jeblair: no, port forwarding is necessary	17:58
clarkb	I may have a stand in client though /me looks	17:58
clarkb	jeblair: I do have a simple stand in client	17:59
clarkb	would you like a copy of that?	17:59
jeblair	notmyname: not atm, however i do think such a thing is possible; probably in zuul.scheduler.Scheduler._processOneItem.	18:00
jeblair	clarkb: that would be lovely	18:00
jeblair	notmyname: if you wanted to hack on zuul :)	18:00
*** gaelL_ has quit IRC		18:00
notmyname	jeblair: I'll add it to the todo list, but I can't promise it will be near the top	18:00
jeblair	notmyname: note that "resets of head item" and "resets of any item" are probably both interesting and distinct	18:00
*** gaelL has joined #openstack-infra		18:01
jeblair	notmyname: if you don't get to it, i will eventually.	18:01
notmyname	jeblair: with that number (and I'm guessing it will be high since the overall chance of success is so low), I think you can get a good feel for the value of the pipeline approach. I suspect that the current pipeline isn't doing much besides keeping the DC warm	18:01
*** gyee has joined #openstack-infra		18:01
clarkb	jeblair: http://paste.openstack.org/show/54967/ super simple	18:02
clarkb	it provides only the necessary subset of event data that the gearman worker relies on	18:02
jeblair	notmyname: not sure what you mean by 'pipeline approach'?	18:02
clarkb	stopping jenkins now	18:02
clarkb	* on jenkins-dev	18:02
*** yolanda has quit IRC		18:04
fungi	yaguang: https://review.openstack.org/55939 only just merged a few hours ago to address the iso8601 issues preventing grizzly integration testing, so this is probably a new bug which was being hidden by that one	18:04
notmyname	jeblair: optimistically queueing all the patches rather than doing them serially.	18:04
*** freyes has joined #openstack-infra		18:05
*** harlowja has quit IRC		18:05
*** sandywalsh has quit IRC		18:05
fungi	i think as long as we manage to merge 1.5 changes an hour on average, the pipeline is going at least as fast as serial testing would	18:05
yaguang	fungi, yes, the iso8601 issue disappeared	18:05
*** sandywalsh has joined #openstack-infra		18:05
yaguang	fungi, it seems there is a new one + sudo chown -R jenkins /opt/stack/new/savanna-dashboard	18:06
yaguang	2013-12-13 16:33:06.574 \| + cd /opt/stack/new/requirements	18:06
yaguang	2013-12-13 16:33:06.597 \| + python update.py /opt/stack/new/savanna-dashboard	18:06
yaguang	2013-12-13 16:33:06.598 \| Traceback (most recent call last):	18:06
yaguang	2013-12-13 16:33:06.598 \| File "update.py", line 94, in <module>	18:06
yaguang	2013-12-13 16:33:06.599 \| main(sys.argv[1:])	18:06
yaguang	2013-12-13 16:33:06.621 \| File "update.py", line 90, in main	18:06
yaguang	2013-12-13 16:33:06.621 \| _copy_requires(req, argv[0])	18:06
yaguang	2013-12-13 16:33:06.622 \| File "update.py", line 71, in _copy_requires	18:06
yaguang	2013-12-13 16:33:06.622 \| dest_reqs = _parse_reqs(dest_path)	18:06
yaguang	2013-12-13 16:33:06.623 \| File "update.py", line 49, in _parse_reqs	18:06
jeblair	notmyname: ah, yes; i'd refer to that as speculative execution. but yes, as the test-subject system's reliability decreases it degrades to its worst-case behavior which is serial merging.	18:06
yaguang	2013-12-13 16:33:06.651 \| pip_requires = open(filename, "r").readlines()	18:06
yaguang	2013-12-13 16:33:06.676 \| IOError: [Errno 2] No such file or directory: '/opt/stack/new/savanna-dashboard/tools/pip-requires'	18:06
yaguang	2013-12-13 16:33:06.751 \| Process leaked file descriptors. See http://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for more information	18:06
yaguang	2013-12-13 16:33:07.287 \| Build step 'Execute shell' marked build as failure	18:06
fungi	yaguang: please use http://paste.openstack.org/ in the future	18:06
clarkb	fungi: 1.543 is running on jenkins-dev. Want to give nodepool a sping with the correct NODEPOOL_SSH_KEY value?	18:07
notmyname	fungi: rounding up from the current status to assume a 70% chance of failing, that means that a queue of 10 patches has a 2.8% chance of not being reset (ie the 10th patch has a 2.8% chance of landing)	18:07
fungi	yaguang: grizzly was broken for so long that there are likely to be new external/dependency-related issues which crept in during that span	18:07
jeblair	notmyname: i think we have the configuration structured so that it shouldn't be _worse_ than serial merging. but yes, it's, um, providing some load for our providers.	18:07
fungi	notmyname: makes sense. just pointing out that if the average duration of our longest tests is 0.75 hours then serial testing is only going to merge 1.5 changes an hour best case (assuming none fail)	18:08
notmyname	current queue depth of 34 means a 0.00077% chance of landing	18:08
fungi	0.00077% chance of landing on that iteration, but it will be automatically retried until there are no failures ahead of it in the pipeline	18:09
yaguang	fungi, to debug the issue, where can I find the source code for check-requirements-integration-dsvm gate ?	18:09
*** dolphm has quit IRC		18:09
*** dolphm has joined #openstack-infra		18:10
notmyname	fungi: right. I'm saying that the last item in the queue pretty much doesn't stand a chance of getting through without a retry	18:10
fungi	yaguang: in openstack-dev/pbr, openstack-infra/pypi-mirror and openstack-infra/config. i'll get you urls to the relevant files in just a momenty	18:10
notmyname	fungi: and the result is that there is a pretty low chance of doing much more than "serial speed", but now we have a bunch of servers wasting cycles for tests on patches that won't land	18:11
*** jpeeler has quit IRC		18:11
*** dolphm has quit IRC		18:11
fungi	notmyname: agreed--there may be a numbers game to determining a sweet spot for maximum pipeline length beyond which it makes no sense to start jobs until you get closer to the head of the gate	18:11
*** dolphm_ has joined #openstack-infra		18:11
notmyname	fungi: right	18:11
*** jpeeler has joined #openstack-infra		18:12
*** dolphm_ has quit IRC		18:12
fungi	dependent on the current/recent average failure rate for changes	18:12
jeblair	notmyname: one thing that has been considered is allowing elastic-recheck to see non-final job results to collect more data on bug frequency	18:12
jeblair	notmyname: that is a potential use for the otherwise discarded test runs further down the queue	18:12
notmyname	jeblair: that would be good. it would magnify problem areas	18:12
fungi	i suspect the slave discard/rebuild overhead places the sweet spot somewhere in the vicinity of 50-100% use of the maximum pool size/quota aggregate too	18:14
notmyname	jeblair: fungi: but the real source of the problem is that even a 5% pass rate drop has a _massive_ effect on the efficiency of the overall gate queue. the 34th item in the queue only has an 18.4% chance of landing with no retries even if the pass rate is 95% (as opposed to the current <70%)	18:14
fungi	since that eliminates nodepool's ability to get ahead of the node demand	18:14
*** alcabrera is now known as alcabrera\|afk		18:14
notmyname	and I don't think this is big revelation to anyone, but it's at lest a new way to see and track the data	18:15
jeblair	notmyname: yep; that's the impetus behind jog0's effort to try to get on gate bugs early.	18:15
jeblair	notmyname: ++ more visibility	18:15
jeblair	okay, back to skynet for me	18:16
*** freyes has quit IRC		18:16
*** ruhe has quit IRC		18:16
*** matel has quit IRC		18:16
clarkb	fungi: ready to nodepool on jenkins-dev?	18:17
fungi	clarkb: we can. gimme just a minute	18:17
clarkb	no rush, ping me when I should pay attention	18:17
fungi	yaguang: the meat of that job is this script... https://git.openstack.org/cgit/openstack-dev/pbr/tree/tools/integration.sh	18:18
yaguang	fungi, many thanks :)	18:18
fungi	yaguang: the run-mirror command it's using to test building the set is http://git.openstack.org/cgit/openstack-infra/pypi-mirror/tree/pypi_mirror/cmd/run_mirror.py	18:19
*** basha has quit IRC		18:19
fungi	yaguang: the job definition (the entry point for jenkins) can be seen at http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/jenkins_job_builder/config/requirements.yaml#n1	18:20
fungi	yaguang: and that job is actually running the integration test script within the context of these https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/	18:21
*** Ryan_Lane has quit IRC		18:22
fungi	so i guess four relevant projects involved in running that	18:22
*** gyee has quit IRC		18:22
fungi	(not to mention the openstack/requirements project itself, where the projects list and global requirements reside_	18:22
yaguang	may be some projects doesn't have pip-requires file at initial time	18:23
yaguang	cloning savanna-dashboard	18:24
yaguang	fungi, thanks a lot for the info	18:24
*** harlowja has joined #openstack-infra		18:24
fungi	yaguang: actually, i think this may be more involved. we lack a lot of the mechanisms for requirements testing in grizzly since they were developed in the havana cycle and not all backported. we may be better off setting that job to only run on stable/havana and later for now	18:25
fungi	mordred: ^ opinion?	18:25
*** basha has joined #openstack-infra		18:26
fungi	yaguang: i have an outstanding change to backport some of that and try to get it working, but it was waiting on the iso8601 situation to clear up. at the moment i don't have any reasonable expectation that job will run correctly at all	18:26
fungi	i'll propose a change real quick to exclude stable/grizzly until more of those bits are in place (though we're getting close enough to eol for that release that it may not make sense to invest much more time in requirements consistency there anyway)	18:27
yaguang	fungi, I also have some backports are blocked for a long time	18:28
clarkb	fungi: I would go along with that	18:29
*** rwsu has quit IRC		18:32
harlowja	qq, are stackforge gate/merge jobs currently disabled?	18:32
harlowja	wondering if i should kick https://review.openstack.org/#/c/60850/ to try to get it to move, or not worry about it yet	18:33
*** mgagne1 has joined #openstack-infra		18:33
*** mgagne1 has quit IRC		18:33
*** mgagne1 has joined #openstack-infra		18:33
*** mgagne has quit IRC		18:34
*** mgagne has joined #openstack-infra		18:34
*** mgagne has quit IRC		18:34
*** mgagne has joined #openstack-infra		18:34
*** yaguang has quit IRC		18:34
fungi	harlowja: i'm not immediately seeing a good reason for 60850 not to be in progress on http://status.openstack.org/zuul/ so it may merit further investigation	18:35
harlowja	k, another one of interest, http://logs.openstack.org/20/54220/36/check/gate-taskflow-pep8/444a457/console.html	18:35
fungi	harlowja: there's nothing special going on for stackforge... it's treated the same as far as whether and when gating is started	18:35
*** jergerber has joined #openstack-infra		18:35
harlowja	kk, thx fungi	18:35
harlowja	fungi should i try to kick those (recheck no bug) or just leave them for a little?	18:37
openstackgerrit	Jeremy Stanley proposed a change to openstack-infra/config: Don't run requirements integration for Grizzly https://review.openstack.org/62055	18:37
fungi	harlowja: the failure in the log you linked can be rechecked or reverified against bug 1260654	18:38
uvirtbot	Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Critical,Fix released] https://launchpad.net/bugs/1260654	18:38
harlowja	k, thx fungi	18:38
fungi	precise14 was not a happy camper this morning	18:38
*** mgagne1 has quit IRC		18:38
*** basha has quit IRC		18:38
harlowja	:)	18:39
fungi	harlowja: on 60850 i think you may have originally set approval without a +2 vote and then added the +2 vote after, which might be the reason. try removing and adding your approval on it	18:40
harlowja	kk	18:40
*** rwsu has joined #openstack-infra		18:40
*** zehicle_at_dell has quit IRC		18:41
*** gyee has joined #openstack-infra		18:42
*** herndon has joined #openstack-infra		18:43
*** johnthetubaguy has quit IRC		18:44
fungi	harlowja: looks like you undid your +2 code review on it (which you should add back) but did not remove your +1 approve (which is the one you need to reapply for zuul to notice)	18:44
harlowja	ah	18:44
harlowja	wrong one, thx	18:44
fungi	glad to help	18:44
harlowja	need more coffee, ha	18:44
*** basha has joined #openstack-infra		18:45
*** rossella_s has joined #openstack-infra		18:45
*** zehicle_at_dell has joined #openstack-infra		18:46
fungi	harlowja: however, that theory didn't pan out since it's still not being tested. i think it may be because you have a draft change indirectly depending on it (61689), and i think we may still have corner case bug where if zuul can't retrieve and inspect the entire chain of dependent and reverse-dependent changes, it doesn't enqueue	18:49
harlowja	ah	18:49
fungi	zuul, like the rest of the general public, is blind to draft changes	18:50
clarkb	fungi: this is where you say "don't use drafts" :)	18:50
fungi	(one of the reasons we recommend against the draft feature)	18:50
fungi	heh	18:50
^d	Ugh, drafts.	18:50
harlowja	ya, and 61689 seems hidden	18:50
harlowja	hmmm	18:50
*** rcarrillocruz1 has joined #openstack-infra		18:51
*** herndon has quit IRC		18:51
harlowja	so if that draft is not a draft but a WIP that should solve this?	18:51
*** praneshp has joined #openstack-infra		18:51
clarkb	yes	18:51
harlowja	k	18:51
*** rcarrillocruz has quit IRC		18:52
harlowja	i think i know who owns that draft, will bug him	18:52
fungi	harlowja: it should, yes, though one of the patches in the set will need reapproval again probably after you publish that draft	18:52
fungi	so that zuul will notice	18:52
*** mriedem has quit IRC		18:52
harlowja	k	18:53
harlowja	thx guys	18:53
*** apevec has joined #openstack-infra		18:53
fungi	the last time i looked into one of these, i found a traceback in zuul's log from where it tried to retrieve a reverse-dependent change which was in a draft state, and failed to enqueue the non-draft parent change as a result	18:53
fungi	can't remember if i filed a bug or not	18:53
harlowja	seems like it should almost skip over drafts completly	18:53
apevec	mordred, https://review.openstack.org/61237 (grizzly reqs) failed on savanna, but savanna doesn't have grizzly branch afaict?	18:54
fungi	well, it can't have any hope of skipping them if they're required for the change in question, but if they're merely draft changes requiring the non-draft change that seems like one we could do something about	18:54
fungi	apevec: https://review.openstack.org/62055	18:55
apevec	what's requirements-integration test doing ?	18:55
apevec	fungi, ah thanks	18:55
clarkb	dkranz: re https://review.openstack.org/#/c/61850/ were my suggestions in patchset 2 not good? (I think the logic in patchset 3 is much more complicated than it needs to be)	18:55
harlowja	fungi agreed, for reviews that are dependent on a draft, ya, nothing u can do, but the other way around (a draft dependent on a review) seems like u could just ignore that draft (and all its dependents, if any)	18:56
*** basha has quit IRC		18:56
fungi	apevec: i think that job got added while grizzly was broken from iso8601 so we didn't think about the implications on that branch	18:56
clarkb	harlowja: the greater problem is that drafts just don't work	18:56
harlowja	or that clarkb :)	18:56
clarkb	there are so many corner cases where they fall over. It isn't just zuul having a hard time	18:57
fungi	the worst part of gerrit drafts, in my opinion, is that as the gerrit server admin you can't even disable them	18:57
harlowja	easy/hard to remove draft feature complety?	18:57
harlowja	ah	18:57
fungi	if it were a config option, i wouldn't care	18:57
apevec	fungi, thanks, I've added comment in the review to prevent rechecks in vain	18:57
fungi	instead it's baked in, non-optional and thus an attractive nuisance	18:58
harlowja	fungi agreed	18:58
fungi	also, the idea of stashing "hidden" changes in progress in the code review system runs pretty counter to what i think open development processes are all about	18:59
*** alcabrera\|afk is now known as alcabrera		18:59
*** yamahata_ has quit IRC		18:59
fungi	apevec: i think that requirements change is also probably not really absolutely necessary, since i'm pretty sure we don't do requirements enforcement on stable branches anyway (at least not for grizzly but i think also still not for havana either)	19:00
*** mrodden has quit IRC		19:00
apevec	fungi, true, but it'd be nice to keep new updates somewhat synced	19:01
fungi	(though the havana ones do need to get enforced. on my to do list to check back into the state of those)	19:01
fungi	apevec: agreed	19:02
clarkb	fungi: I am fiddling with passing NODEPOOL_SSH_KEY into the daemon env in the init script, but I am probably better off patching nodepool to accept that key as an option	19:05
fungi	k. i'm free to help as soon as i finish filing this zuul bug i should have filed ages ago	19:06
clarkb	awesome	19:06
*** dstanek has quit IRC		19:06
clarkb	I am still reading source to try and figure out how this best fits in	19:06
anteaya	any point in adding a comment in git review so that if someone does use the flag to submit a draft they know they are creating a painful situation?	19:07
clarkb	half tempted to dump the public key literally into the yaml config file and just read it there	19:07
anteaya	like "are you sure you want to create a draft? This will bite you later."	19:07
clarkb	but then you have to sort out logic like kicking off image rebuilds if the config changes which I don't think exists today	19:07
fungi	clarkb: it's not as if we don't do similar things elsewhere (though the path to a keyfile would certainly be nicer)	19:08
notmyname	anteaya: draft == WIP status?	19:08
anteaya	notmyname: no, draft means only certain folks can see it	19:08
notmyname	ah	19:08
anteaya	WIP is a button in the gui for the patch	19:08
pleia2	better to use wip than draft	19:08
*** mrodden has joined #openstack-infra		19:08
anteaya	submit and then push "work in progress"	19:08
clarkb	except it isn't completely private when you draft, anyone can still fetch the code if they are smart	19:09
anteaya	pleia2: yes	19:09
fungi	we need to add a wip flag back into git-review but have been holding off until we see the state in gerrit 2.9	19:09
anteaya	kk	19:09
anteaya	notmyname: and draft makes future operations on that patch a pain	19:09
fungi	since the wip feature we're using now exists only in our own fork of gerrit 2.4	19:09
anteaya	which I believe was what was being discussed above	19:09
*** mriedem has joined #openstack-infra		19:10
*** mgagne has quit IRC		19:17
sdague	so one place where I think it would be good for infra to auto recheck would be when any of the test results come back as UNSTABLE	19:18
sdague	as that clearly was an infra fail	19:18
*** sandywalsh has quit IRC		19:19
fungi	clarkb: i assume it's safe to blow away the bad images and nodes from last night's nodepool-dev experiments	19:19
*** rongze has quit IRC		19:20
*** dims has quit IRC		19:21
clarkb	fungi: yup should be	19:22
clarkb	they weren't used for anything	19:22
clarkb	sdague: do we still have instances of UNSTABLE jobs making it to reporting? I think the problem there is that when zuul cancells jobs intentionally they sometimes report back as UNSTABLE	19:23
*** mgagne has joined #openstack-infra		19:23
*** mgagne has quit IRC		19:23
*** mgagne has joined #openstack-infra		19:23
clarkb	but I suppose in those cases we would know why	19:23
sdague	clarkb: https://review.openstack.org/#/c/61778/4 just got hit by it	19:23
*** sharwell has joined #openstack-infra		19:23
sdague	because basically UNSTABLE is completely unuseful to a person, because it means there typically aren't any logs. So the only option is recheck no bug anyway	19:24
clarkb	sdague: well we should always have the console log...	19:25
*** CaptTofu has joined #openstack-infra		19:25
clarkb	but it is usually an infra problem	19:25
sdague	clarkb: sometimes we don't	19:25
fungi	right, depends on how long it's been	19:25
jeblair	sdague: are you talking about errors from the bad jenkins slaves earlier?	19:27
fungi	the other problem there is that jenkins will persist jobs to the same slaves if available	19:27
sdague	jeblair: that might be what this was	19:27
jeblair	fungi: was precise20 one of those?	19:28
sdague	just trying to think about improvements to the system	19:28
sdague	yes it was	19:28
fungi	jeblair: yep	19:28
fungi	these are things i expect will get better once we no longer run tests on long-lived slaves and use nodepool. not too much longer now	19:28
*** jaypipes has joined #openstack-infra		19:28
jeblair	sdague: so zuul does re-launch jobs when it detects some kinds of jenkins failures	19:29
sdague	jeblair: ok, so maybe expand that?	19:29
jeblair	sdague: though obviously this isn't one of them. it's possible that retrying on unstable for this error would have made things better, inasmuch as it may have eventually been assigned to a node other than precise20 (possibly after retrying 200 times or something because of what fungi just pointed out)	19:30
*** yassine has quit IRC		19:30
jeblair	sdague: but often retrying on unstable results isn't going to get us anywhere, and may make things worse (logs.o.o full as an example)	19:30
fungi	http://logs.openstack.org/55/62055/1/check/gate-config-layout/5ad5222/console.html "Building remotely on bare-precise-rax-ord-850570..."	19:31
jeblair	sdague: so i don't think that build result alone is enough to automate a retry on	19:31
sdague	jeblair: so classifying the kind of problem is probably important	19:31
sdague	honestly, an exception like that should down that node	19:31
jeblair	sdague: i agree; i think that's a jenkins bug....	19:31
fungi	i bet retrying an unstable job once we're using bare nodepool nodes for them will be slightly more effective	19:31
sdague	about every 3 weeks we have a node go wonky like that and destroy an entire development day for .eu	19:31
sdague	because there is no one to solve that in that TZ	19:32
jeblair	sdague: but we think that going to all-dynamic slaves will solve this problem	19:32
sdague	jeblair: ok, well if that's close, cool	19:32
fungi	sdague: see the link i posted	19:33
fungi	we're already doing it some	19:33
jeblair	sdague: it's very much in progress ^ :)	19:33
sdague	jeblair: ok cool	19:33
fungi	i did mention that in the bug when i resolved it as well	19:34
sdague	every time we have a 'slode I just like to figure out "how does this problem never happen again"	19:34
jeblair	sdague: this is a unit-test like job that ran on one: http://logs.openstack.org/54/61954/2/check/gate-config-layout/0702552/console.html	19:34
sdague	fungi: sure, I guess timeline was the question	19:34
jeblair	sdague: yes, me too. sometimes that involves a long multi-step process. fortunately we're near the end of this one.	19:35
sdague	cool	19:35
clarkb	the slave threading should help with this too	19:35
fungi	i hope so	19:36
sdague	yeh, it's just been a very bad gate week, and only slightly related to actual openstack bugs :)	19:36
clarkb	I think errors like this are jenkins being unable to maintain 300 ssh connections with thousands of threads all vying for cpu cycles	19:36
*** dims has joined #openstack-infra		19:36
fungi	i suspect precise14 and precise20 got into a bad state after we restarted jenkins02 (timeline seems about right) but wasn't obvious until we'd all gone to sleep	19:37
jeblair	sdague: to be fair, i think the 30% failure rate in openstack is more than a slight contribution.	19:37
*** talluri has quit IRC		19:37
jeblair	fungi, clarkb: they never recover, so i think it's more than just contention.	19:37
openstackgerrit	Clark Boylan proposed a change to openstack-infra/nodepool: Allow for ssh key file path in config. https://review.openstack.org/62066	19:38
*** talluri has joined #openstack-infra		19:38
jeblair	clarkb: do you need that in asap? ^	19:38
clarkb	jeblair: I don't think so	19:38
clarkb	jeblair: we will get by running nodepool in the foreground for now	19:38
praneshp	hey all, was any of you able to run the docs test successfully after pinning sphinx<1.2?	19:39
clarkb	praneshp: yes	19:39
fungi	clarkb: we're also back to a clean slate now--old images and nodes deleted successfully	19:39
clarkb	fungi: awesome	19:39
sdague	jeblair: how are you computing that #? because while the SSH race is bad, it's not 30% bad	19:39
clarkb	praneshp: you may need to update tox.ini to disable pip install --pre	19:39
clarkb	praneshp: line 9 of nova's tox.ini does this	19:40
fungi	sdague: i'm guessing it was an instance of http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/5000/600/5652/5652.strip.gif	19:40
praneshp	clarkb thanks. Let me look into my tox.ini	19:41
openstackgerrit	Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067	19:41
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682	19:41
sdague	fungi: :)	19:42
openstackgerrit	Clark Boylan proposed a change to openstack-infra/nodepool: Allow for ssh key file path in config. https://review.openstack.org/62066	19:42
*** sandywalsh has joined #openstack-infra		19:43
openstackgerrit	Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067	19:43
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682	19:43
jeblair	sdague: http://not.mn/gate_status.html	19:44
fungi	yeah, that does seem to average out to about 30% failing	19:44
sdague	jeblair: so that includes all the fails, including the infra fails, which currently are the #1 recheck bug	19:44
fungi	based on recent freshness metrics presumably	19:45
jeblair	sdague: the infra fails are the dip from a few hours ago.	19:45
praneshp	clarkb i don't have a line relating to pup install --pre in my tox.ini https://review.openstack.org/#/c/61615/17/tox.ini	19:45
clarkb	sdague: which bug is that? rechecks page says 1253896 which isn't infra	19:45
clarkb	praneshp: do you have a line like line 9 in nova's tox.ini?	19:45
praneshp	one sec.	19:46
sdague	http://status.openstack.org/elastic-recheck/	19:46
jeblair	sdague: this chart is based on jog0's chart which, on the right edge measure real-time failure rates of jobs	19:46
praneshp	clarkb nope.	19:46
clarkb	praneshp: that is what you need	19:46
sdague	jeblair: so I'm not actually trying to argue who's more to blame here	19:46
praneshp	ok, let me try, thanks	19:46
sdague	I'm just saying, it's really hard to get anyone to look at the ssh bug when things are exploding for other reasons	19:46
*** rossella_s has quit IRC		19:47
jeblair	sdague: sure. but you included some hyperbole in your statements that i don't think helped the situation.	19:47
*** zehicle_at_dell has quit IRC		19:47
openstackgerrit	Matt Farina proposed a change to openstack-infra/config: New project request: PHP-Client https://review.openstack.org/62069	19:48
clarkb	fungi: were you going to start nodepool in the foreground?	19:49
*** dolphm has joined #openstack-infra		19:49
clarkb	lunch should be starting here shortly but will do my best ot pay attention	19:49
*** Ryan_Lane has joined #openstack-infra		19:50
fungi	clarkb: yeah, i was going to try `sudo -u nodepool NODEPOOL_SSH_KEY=~jenkins/.ssh/id_rsa.pub nodepoold -d` in a screen session, but the jenkins public key isn't readable by the nodepool user so i'm pondering options	19:52
anteaya	fwiw, we are working hard on the ssh bug, it is proving to be a tricky one, markmcclain salv-orlando beagles and dkehn are all working on it right now	19:53
anteaya	will update when we have anything	19:53
*** ^d is now known as ^demon\|away		19:53
fungi	clarkb: i think i may just resort to copying it somewhere accessible for now (it's not as if the file's sensitive anyway)	19:53
openstackgerrit	Michael Still proposed a change to openstack-infra/jeepyb: Rename the subscriber map to be a more generic config file. https://review.openstack.org/62073	19:54
openstackgerrit	Michael Still proposed a change to openstack-infra/jeepyb: Allow configurable mappings to different LP projects https://review.openstack.org/62074	19:54
clarkb	fungi: ++ not sensitive	19:54
fungi	okay, it's running	19:55
clarkb	fungi: I think the var needs to have the actual file contents	19:55
*** CaptTofu has quit IRC		19:55
clarkb	the path won't work there	19:55
jeblair	mriedem: see my comment in https://bugs.launchpad.net/tempest/+bug/1258682	19:55
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid]	19:55
fungi	ohhhhhhhh	19:55
* fungi totally misread it		19:56
jeblair	mriedem: not all timeouts are due to the same cause.	19:56
fungi	tanks clarkb	19:56
clarkb	fungi: it is passed literally to puppet on the remote end	19:56
fungi	mmm, nodepool is also like the honey badger when it comes to trapping sigint, i see	19:57
jeblair	mriedem: however, i know of no current infra issues that would contribute to timeouts, so i think we can assume that all _current_ timeouts are due to the unknown bug	19:57
mriedem	jeblair: ok, i just pushed an e-r query for it	19:57
jeblair	sdague: ^ this is a big untracked contributer for gate failures	19:57
mriedem	since there are no logs with errors besides console.html, i didn't have much to base the query on	19:57
jeblair	sdague: that makes things worse by taking 45 min jobs to 90 mins	19:57
clarkb	fungi: ya :( will probably need to delete the image build stuff too	19:57
mriedem	jeblair: https://review.openstack.org/#/c/62067/	19:58
fungi	clarkb: i plan to	19:58
*** dstanek has joined #openstack-infra		19:59
sdague	mriedem: can yuo change the message part to	19:59
fungi	clarkb: cleaned up... so how about: sudo -u nodepool NODEPOOL_SSH_KEY="`cat /tmp/id_rsa.pub`" nodepoold -d	19:59
fungi	clarkb: is that what you had in mind?	19:59
sdague	message:"Build timed out (after" AND message:"minutes). Marking the build as failed."	19:59
*** jaypipes has quit IRC		19:59
sdague	so it catches all the job timeouts, not just the ones set to 90 minutes	20:00
*** dcramer_ has quit IRC		20:00
clarkb	fungi: ya, see the nodepool readme, that is basically what it does	20:00
clarkb	sdague: that query is even better than mine :)	20:00
jeblair	sdague, mriedem: we'll want to remove the query asap after fixing the bug too, because lots of people upload broken code that times out	20:00
fungi	clarkb: founf it. thanks	20:01
fungi	er, found	20:01
sdague	so 75 hits over 7 days actually makes it 9th in the e-r bug list (by count)	20:02
sdague	just to get a sense of relative frequency	20:02
*** lcestari has quit IRC		20:02
fungi	clarkb: for the benefit of our sanity, i checked the log and nodepool does think it needs two nodes, so could be an off-by-one/rounding error, or maybe that's an effect of the load prediction heuristic	20:03
clarkb	fungi: so if I sudo nodepool list I should see the data from the foreground process right? since this is all in the DB	20:03
clarkb	fungi: weird	20:03
fungi	clarkb: image-list at the moment	20:03
fungi	clarkb: list will start showing content once the image finished building	20:03
clarkb	fungi: I wonder if the heuristic will always do one + what it determines	20:04
clarkb	or some other silly off by one error	20:04
fungi	clarkb: i also have both commands being called every 60 seconds under watch in the second window of that screen session	20:04
mriedem	sdague: yeah, i can update the message	20:04
fungi	clarkb: wild guess would be that it's rounding up from very small values of 1 ;)	20:05
*** eharney has quit IRC		20:05
* fungi has not looked back at that section of the code to make a more reasoned guess		20:06
jeblair	sdague: yeah, just pointing out that i think the 2x runtime factor aggravates its severity (when it hits, it has the same throughput effect of hitting twice in a row).	20:06
openstackgerrit	Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067	20:06
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682	20:06
sdague	yep, definitely	20:06
jeblair	clarkb, fungi: is there something i can help elucidate?	20:06
*** harlowja has quit IRC		20:07
sdague	also, the folks trying to land stable/grizzly patches that didn't fix their doc jobs is a huge problem now as well	20:07
fungi	jeblair: min-ready is set to 1 and nodepool believes it needs 2 nodes	20:07
mriedem	219 hits > 77 hits:	20:07
mriedem	http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQnVpbGQgdGltZWQgb3V0IChhZnRlclwiIEFORCBtZXNzYWdlOlwibWludXRlcykuIE1hcmtpbmcgdGhlIGJ1aWxkIGFzIGZhaWxlZC5cIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4Njk2NTA5OTI1NX0=	20:07
sdague	as a neutron job will reset ahead of them, then they'll be put back into the zuul queue	20:07
mriedem	good call sda	20:07
mriedem	sdague: *	20:07
clarkb	jeblair: at this point I don't think so	20:07
anteaya	sdague: the neutron job being the ssh bug?	20:07
sdague	anteaya: the ssh bug that I pointed as top bug yesterday	20:08
anteaya	yes	20:08
fungi	clarkb: jeblair: i'm less concerned with nodepool math problems at the moment and just want to make sure we have all the moving parts in place on jenkins-dev	20:08
anteaya	which 4 devs are working on now	20:08
sdague	mriedem: actually, that's catching a ton of swift issues	20:08
sdague	in their unit tests	20:08
anteaya	continuing from yesterday	20:08
sdague	so I'm not sure that was a good call	20:08
jeblair	fungi: but i'm curious, what's the math problem?	20:08
notmyname	sdague: ? (swift ping)	20:08
fungi	jeblair: min-ready is set to 1 and nodepool believes it needs 2 nodes	20:09
fungi	jeblair: with no jobs underway on jenkins-dev	20:09
jeblair	fungi: can i see the debug output from the allocator?	20:09
mriedem	sdague: clarkb: like this: http://logs.openstack.org/15/60215/2/check/gate-swift-python26/1b3754e/console.html	20:09
fungi	jeblair: probably so. i'll fish it out	20:09
fungi	jeblair: scratch that	20:09
mriedem	http://logs.openstack.org/15/60215/2/check/gate-swift-python26/1b3754e/console.html#_2013-12-13_19_31_29_115	20:09
sdague	mriedem: yes	20:09
*** dprince has quit IRC		20:10
fungi	jeblair: clarkb: jobs are actually underway on jenkins-dev, just none i would expect to need these nodepool nodes	20:10
fungi	anyway, getting debug output	20:10
mriedem	sdague: so going back to the strict message i had first	20:10
sdague	mriedem: yeh	20:10
praneshp	hey clarkb: thanks a lot, my review passed jenkins	20:10
praneshp	*patch	20:10
mfer	clarkb any chance i could get you to look at https://review.openstack.org/#/c/62069/ ... or maybe there's someone else i could hit up	20:10
clarkb	persia: np	20:10
jeblair	mriedem, sdague: current timeout values for d-g jobs are 60,90,120	20:11
jeblair	mriedem, sdague: we could change them to 61,91,121 for better fingerprinting	20:11
sdague	you could match job name	20:11
clarkb	mfer: currently busy trying ot make jenkins more reliable. also manage-projects is still giving us grief... probably won't get to it today	20:11
jeblair	sdague: oh, right, that's a different field so you can match it. that's better. :)	20:12
sdague	build_name is a valid thing to match	20:12
mfer	clarkb kk	20:12
fungi	jeblair: clarkb: debug log from daemon start to end of demand analysis... http://paste.openstack.org/show/54971	20:12
fungi	jeblair: clarkb: so i think that's our answer	20:13
clarkb	oh it has jobs queued	20:13
fungi	it wants to run some jobs on them ;)	20:13
clarkb	fungi: that is good, it means we will get end to end testing :)	20:13
fungi	mystery solved	20:13
mriedem	sdague: jeblair: but can you do ORs?	20:13
clarkb	mriedem: yes	20:13
sdague	mriedem: yes	20:14
sdague	and you can use parens to group	20:14
mriedem	can i use ternary operators? :)	20:14
jeblair	fungi: cool. error: situation normal. :)	20:14
fungi	clarkb: i'm going to clear old nodepool nodes manually out of jenkins-dev too	20:14
clarkb	fungi: o	20:14
fungi	jeblair: yes, very much so	20:14
clarkb	*ok	20:14
clarkb	mriedem: http://lucene.apache.org/core/2_9_4/queryparsersyntax.html	20:15
clarkb	that is for an older version of lucene but I think the query syntax hasn't changed	20:15
*** Abhishe__ has quit IRC		20:15
*** UtahDave1 has joined #openstack-infra		20:16
*** mrodden has quit IRC		20:16
*** UtahDave has quit IRC		20:17
*** UtahDave1 is now known as UtahDave		20:17
fungi	clarkb: watching jenkins-dev, i think we still have some old periodic jobs which need to be deleted from it	20:17
Alex_Gaynor	dhellmann: Good catch -- I totally missed the existing +2	20:17
Alex_Gaynor	(and then I missed the follow up comment, not doing real well today am I?)	20:17
clarkb	fungi: probably	20:17
dhellmann	Alex_Gaynor: yeah, I do that sometimes so I figured that's what it was	20:17
fungi	clarkb: particularly the devstack-reap-vms-* jobs and similar	20:17
* fungi fixes		20:17
fungi	though hopefully they no longer match the new nodepool names	20:18
*** mrodden has joined #openstack-infra		20:18
mriedem	message:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND (build_name:"check-tempest-dsvm-postgres-full" OR build_name:"check-tempest-dsvm-full") AND filename:"console.html"	20:19
*** zehicle_at_dell has joined #openstack-infra		20:19
sdague	mriedem: s/check/gate/ ?	20:20
mriedem	sdague: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQnVpbGQgdGltZWQgb3V0IChhZnRlclwiIEFORCBtZXNzYWdlOlwibWludXRlcykuIE1hcmtpbmcgdGhlIGJ1aWxkIGFzIGZhaWxlZC5cIiBBTkQgKGJ1aWxkX25hbWU6XCJjaGVjay10ZW1wZXN0LWRzdm0tcG9zdGdyZXMtZnVsbFwiIE9SIGJ1aWxkX25hbWU6XCJjaGVjay10ZW1wZXN0LWRzdm0tZnVsbFwiKSBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQ	20:20
mriedem	sdague: should i duplicate the build_names in the query for check and gate?	20:21
mriedem	otherwise the e-r query won't hit on check failures and people will have to hunt for it	20:21
sdague	clarkb: do we have globbing in that field?	20:21
clarkb	sdague: yes, but bot at the beginning of the field (lucene limitation)	20:21
clarkb	also you have to remove the quotes to glob so	20:22
clarkb	check-tempest-* OR gate-tempest-* should work	20:22
clarkb	I wish I could will the image build to go faster :)	20:22
mriedem	message:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND (build_name:check-tempest-* OR build_name:gate-tempest-*) AND filename:"console.html"	20:23
sdague	message:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND filename:"console.html" AND (build_name:gate-tempest* OR build_name:check-tempest*)	20:23
mriedem	essentially the same	20:23
sdague	yeh, that	20:23
mriedem	i'm back to my original number of hits	20:23
mriedem	so looks good	20:23
sdague	yep, we were going at it the same time	20:23
sdague	yep, looks solid	20:24
sdague	push that and I'll land it	20:24
sdague	only 5 hits in the gate	20:24
sdague	which is nice	20:24
*** jcooley_ has quit IRC		20:24
sdague	so it's not actually reseting much	20:24
clarkb	oh grenade	20:24
clarkb	should add grenade beacyse that is timing out a bunch right>	20:25
openstackgerrit	Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067	20:25
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682	20:25
*** zehicle has joined #openstack-infra		20:26
jeblair	mriedem: ^ see clarkb's comment	20:28
sdague	so I'm landing mriedem's current patch, but a follow up to add would be accepted	20:28
*** zehicle_at_dell has quit IRC		20:29
mriedem	check-grenade-* and gate-grenade-* right?	20:29
mordred	backscroll!	20:30
mordred	also	20:30
mordred	the internet works	20:30
mordred	I can type	20:30
mordred	so happy	20:30
mordred	morning everyone	20:30
*** rcarrillocruz has joined #openstack-infra		20:30
clarkb	mordred: good morning	20:30
clarkb	fungi: we have an image id!	20:30
mfer	mordred good morning	20:30
*** rongze has joined #openstack-infra		20:31
clarkb	fungi: almost done building I think	20:31
*** rcarrillocruz1 has quit IRC		20:31
sdague	mriedem: yes	20:31
anteaya	morning mordred	20:32
sdague	also, just a style thing, I've been putting the conjunctions after the break	20:32
*** Ryan_Lane has quit IRC		20:33
anteaya	looking at the gerrit merge log, once 24 hours has passed is there a way of seeing the merges that happened at a specific timestamp	20:33
openstackgerrit	Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add grenade jobs to the bug 1258682 e-r query https://review.openstack.org/62084	20:33
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682	20:33
anteaya	or at least to the closest hour?	20:33
mriedem	clarkb: sdague: https://review.openstack.org/62084	20:33
anteaya	once 000 utc occurs everything is just attributed to the same date	20:33
jeblair	anteaya: you can use an ssh query	20:33
jeblair	anteaya: or the git log. or the git log for openstack/openstack.	20:33
anteaya	okay thanks	20:34
sdague	mriedem: landed	20:34
*** gyee has quit IRC		20:35
mordred	jeblair: in the airport this morning, jog0 requested that we add the infra repos to openstack/openstack - I think it might be an interesting idea - possibly in an infra subdir to be clear about what they are	20:35
fungi	anteaya: yes, like i suggested yesterday, you can see it in cgit if you like browsers... http://git.openstack.org/cgit/openstack/oslo.messaging/log/	20:36
fungi	(otherwise, the git log command)	20:36
jeblair	mordred: why?	20:36
mordred	jeblair: but his specific question was that when he's trying to track down when something started acting wonky, he's been using o/o to walk backwards and look at system state	20:36
anteaya	fungi: yes, thanks	20:36
clarkb	fungi: waiting for the image to leave the building state is like watching paint dry	20:36
*** rongze has quit IRC		20:36
mordred	jeblair: so knowing what various infra things looked like around the time of commit X was a thing he was looking to be able to do	20:36
clarkb	fungi: just want to ready, we have two slaves building	20:36
fungi	yup	20:37
*** zehicle has quit IRC		20:37
jeblair	mordred: that has limited ulitily with infra; most of our changes take effect between 10 and 1440 minutes after the commit lands...	20:37
openstackgerrit	A change was merged to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067	20:37
*** jcooley_ has joined #openstack-infra		20:37
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682	20:37
mordred	jeblair: yeah - that's what I said - and devstack and devstack-gate are already in there	20:38
mordred	but I guess there are potentially things in config, such as job changes, that might be helpful to look at? I feel non-strongly in either direction	20:38
jeblair	mordred: i don't think that was really the intent behind openstack/openstack (i mean, gee, we could just log gerrit merges if that's what's needed)	20:39
*** jcooley_ has quit IRC		20:39
*** zehicle_at_dell has joined #openstack-infra		20:39
openstackgerrit	Michael Still proposed a change to openstack-infra/config: Add project configuration. https://review.openstack.org/62085	20:39
mordred	jeblair: nod	20:39
openstackgerrit	A change was merged to openstack-infra/elastic-recheck: Add grenade jobs to the bug 1258682 e-r query https://review.openstack.org/62084	20:40
uvirtbot	Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682	20:40
jeblair	mordred: so i'm inclined to say "let's not" and if jog0 is extremely persuasive that it's totally useful and he's solved all kinds of problems by having the infra git log on the screen with the openstack git log, maybe we look at doing that or a git merge log thing...	20:40
mordred	jeblair: kk. works for me	20:40
sdague	jeblair: now that you have you awesome priority tool - could you bump this to the top of the queue - https://review.openstack.org/#/c/61428/ ?	20:42
clarkb	fungi: we have slaves!	20:42
sdague	markmcclain thinks that may solve the ssh race (or at least make it a ton better)	20:42
fungi	in jenkins-dev and everything	20:42
clarkb	fungi: they aren't running jobs like I expected though	20:42
fungi	clarkb: but not actually running any jobs	20:42
sdague	basically a neutron + nova change set pair needed to land, the neutron one did, the nova one did not	20:42
fungi	jinx	20:42
*** melwitt has joined #openstack-infra		20:42
sdague	the massive uptick corresponds to the neutron one landing	20:43
*** eharney has joined #openstack-infra		20:43
openstackgerrit	Michael Still proposed a change to openstack-infra/jeepyb: Rename the subscriber map to be a more generic config file. https://review.openstack.org/62073	20:43
openstackgerrit	Michael Still proposed a change to openstack-infra/jeepyb: Allow configurable mappings to different LP projects https://review.openstack.org/62074	20:43
jeblair	sdague: ack, i'll start on that.	20:43
sdague	it's speculation, but good speculation	20:43
clarkb	fungi: With that in place I am going to grab lunch very quickly	20:43
fungi	sdague: i'm betting it was for the CVE-2013-6419 fix?	20:44
uvirtbot	fungi: RESERVED This candidate has been reserved by an organization or individual that will use it when announcing a new security problem. When the candidate has been publicized, the details for this candidate will be provided. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-6419)	20:44
fungi	clarkb: k	20:44
sdague	fungi: it's your patch... so you tell me :)	20:44
fungi	sdague: then yes	20:44
clarkb	fungi: maybe we should just manually trigger some jobs on there, I think that will be sufficient for the nodepool node removal stuff to happen	20:44
clarkb	(by mnually I mean via gearman)	20:44
jeblair	zuul promote --pipeline gate --changes 61428,2	20:44
fungi	sdague: i had to pester the nova devs a but for approval on their half, so it lost a day and the neutron part went in first	20:45
fungi	er, a bit	20:45
fungi	jeblair: magic!	20:46
dkehn	clarkb: quickq: with the cirros-0.3.1-x86_64 images, when trying to ssh into them what is username/password?	20:46
mordred	"clarkb \| fungi: we have slaves!"	20:46
mordred	clarkb, fungi: does that mean nodepool static slave replacement?	20:46
fungi	mordred: if only it was meant the way you read it	20:46
fungi	mordred: though yes, we do	20:46
fungi	mordred: there are already several infra jobs dogfooding on the nodepool bare slaves	20:47
*** melwitt has quit IRC		20:47
fungi	mordred: but we were talking about nodepool dev slaves on jenkins-dev	20:47
jeblair	mordred: fungi and clarkb are working on jenkins-dev; we are using nodepool slaves for some infra jobs, but not generally yet	20:47
mordred	neat	20:48
* mordred is very behind - but thinks everyone is great		20:48
jeblair	clarkb, fungi, mordred: fyi the zuul promote command waits for the queue to be completely reset before returning; that means it can take a while.	20:49
fungi	jeblair: noted	20:49
mordred	jeblair: thanks	20:49
mordred	jeblair: also, baller command	20:49
*** talluri has quit IRC		20:50
openstackgerrit	Michael Still proposed a change to openstack-infra/config: Add project configuration. https://review.openstack.org/62085	20:50
*** esker has joined #openstack-infra		20:50
fungi	i did like "jump the queue" but shortened to just "jump" it lost a bit of its contextual meaning as a command-line	20:50
*** talluri has joined #openstack-infra		20:50
jeblair	6 minutes in this case	20:50
dkranz	clarkb: Sorry, was away. I think your logic was fine and I didn't change it. But unlike previous attempts I tried to run the code locally and got syntax errors that I could not figure out.	20:51
jeblair	fungi: yeah, promote was the only one that read correctly to me as a direct object	20:51
*** vkozhukalov has joined #openstack-infra		20:51
fungi	wfm	20:51
fungi	more important is that it does what we want, which it seems to	20:51
dkranz	clarkb: So I pushed the same logic using the subset of bash I actually understand.	20:51
fungi	jeblair: clarkb: presumably we should be using a modified nodepool node name for the slaves added to jenkins-dev so we can tell them apart in a nova list more easily?	20:52
dkranz	clarkb: This is an important change so at this point I suggest accepting what I pushed if it is correct, or some one who really knows bash just take over the patch.	20:52
*** melwitt has joined #openstack-infra		20:53
jeblair	fungi: it would be nice, though that affects jjb and zuul config. not sure the right answer, but i won't be upset if we accidentally delete a dev slave.	20:54
fungi	jeblair: okay, i won't worry too much about it for now	20:54
*** talluri has quit IRC		20:54
fungi	and yeah, the stability of these slaves is beneath concern	20:55
*** harlowja has joined #openstack-infra		20:56
*** sdake_ is now known as sdake-OOO		20:56
*** sdake is now known as sdake-OOO2		20:57
*** dolphm has quit IRC		20:59
*** zehicle_at_dell has quit IRC		20:59
jeblair	we should really get rid of gate-noop before going to all-dynamic slaves	21:00
dkehn	clarkb: quickq: with the cirros-0.3.1-x86_64 images, when trying to ssh into them what is username/password?	21:00
fungi	dkehn: clarkb is out to lunch, but the internets tell me that you can log in as the "cirros" user with a password of "cubswin"	21:02
fungi	someone is obviously a chicagoan	21:02
dkehn	fungi: thxs	21:02
fungi	np	21:03
*** Ryan_Lane has joined #openstack-infra		21:04
*** jcooley_ has joined #openstack-infra		21:05
*** AaronGr is now known as AaronGr_afk		21:13
*** mfer_ has joined #openstack-infra		21:15
*** mfer has quit IRC		21:16
*** mfer_ has quit IRC		21:16
*** mfer has joined #openstack-infra		21:16
*** mfer has quit IRC		21:17
*** mfer has joined #openstack-infra		21:17
*** ArxCruz has joined #openstack-infra		21:18
*** mfer has quit IRC		21:19
*** zehicle_at_dell has joined #openstack-infra		21:20
*** mfer has joined #openstack-infra		21:20
*** smarcet has left #openstack-infra		21:20
*** mfer has quit IRC		21:20
*** mfer has joined #openstack-infra		21:21
clarkb	I am back	21:21
fungi	clarkb: i hacked up a copy of trigger-job.py in my homedir on jenkins-dev and tried to use it to inject the parameters for https://jenkins01.openstack.org/job/gate-tempest-dsvm-full/2194/parameters/ (third window of the screen session there) but no luck, just sits and no slave gets assigned. what bits may i be missing?	21:22
*** mfer has quit IRC		21:22
*** mfer has joined #openstack-infra		21:23
clarkb	fungi: I don't think jenkins-dev knows about that job	21:23
fungi	oh, hurr	21:23
clarkb	https://jenkins-dev.openstack.org/job/gate-tempest-devstack-vm-full/ is the job it knows about	21:23
fungi	yeah	21:24
fungi	i guess we need to rerun jjb on it?	21:24
clarkb	fungi: maybe	21:25
fungi	or i can just sub out the job name	21:25
fungi	trying that first	21:25
clarkb	ok	21:25
fungi	aha, node labels	21:27
*** syerrapragada has joined #openstack-infra		21:28
*** changbl_ has quit IRC		21:29
*** changbl has joined #openstack-infra		21:29
*** ^demon\|away is now known as ^d		21:29
dkranz	clarkb: Did you see my comments above?	21:30
*** alcabrera has quit IRC		21:30
fungi	clarkb: well, no dice. i changed that job to look for devstack-precise (which matches the label on those nodes) rather than dev-devstack-precise, then retried to trigger the job, but still not much going on	21:31
*** gyee has joined #openstack-infra		21:31
*** ArxCruz has quit IRC		21:31
fungi	i wonder if the parameter list for the job needs to match the parameters i'm passing with the script now :/	21:32
*** vkozhukalov has quit IRC		21:33
clarkb	oh could be	21:33
clarkb	fungi: also is gearman hooked up properly/	21:34
fungi	ooh, good question	21:34
* fungi checks the plugin		21:34
*** jasond has joined #openstack-infra		21:35
fungi	installed, though a couple of revs behind	21:35
jasond	is something wrong with the gate jobs? this seems to be stuck https://review.openstack.org/#/c/59851/	21:35
fungi	clarkb: we should probably update that anyway from a proper new-jenkins testing perspective	21:36
*** esker has quit IRC		21:37
fungi	jasond: stuck how? i see it being tested in the gate pipeline on http://status.openstack.org/zuul/	21:37
clarkb	fungi: ++	21:37
fungi	clarkb: updating it now	21:38
jasond	fungi: it still says "Need Verified" after a reverify 5 hours ago. so it's working as expected?	21:39
fungi	jasond: yes, that means it's in the gate for testing. there are 17 changes still ahead of it by my count	21:41
*** jcooley_ has quit IRC		21:41
jasond	fungi: oh ok. thanks for checking	21:41
fungi	depending on how many of those fail, could still be a while	21:41
fungi	sdague: the theory that https://review.openstack.org/61428 would address ssh timeouts seems to be debunked. after being promoted to the head of the gate it failed on gate-tempest-dsvm-neutron on bug 1253896	21:44
uvirtbot	Launchpad bug 1253896 in tempest "Attempts to verify guests are running via SSH fails. SSH connection to guest does not work." [Critical,Confirmed] https://launchpad.net/bugs/1253896	21:44
jasond	fungi: i noticed that jenkins' vote has been removed since the last reverify. do i need to recheck again?	21:44
fungi	jasond: that gets removed automatically, until gate testing concludes	21:44
fungi	then it will either get a green checkmark or a red x in that column	21:45
jasond	fungi: okay, thanks	21:45
openstackgerrit	Elizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add 2 new ci publication branches to gerritbot https://review.openstack.org/62095	21:46
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Process logs with CRM114 https://review.openstack.org/62096	21:47
jeblair	clarkb, fungi, mordred: ^	21:47
anteaya	fungi: markmcclain has another candidate	21:48
jeblair	clarkb, fungi, mordred: crm114 is a fun language. :)	21:48
anteaya	fungi: he is in a meeting right now and then will address it	21:49
fungi	jeblair: i expect to set aside some time this weekend to revel in it	21:49
fungi	anteaya: thanks for the update. i was mainly just passing along the result	21:49
anteaya	yeah	21:49
*** AaronGr_afk is now known as AaronGr		21:49
jeblair	"Because the commonest use of LIAF is in iteration, LIAF means Loop Iterate Awaiting Failure. If that's too hard to remember, just pretend that LIAF is FAIL spelled backwards."	21:49
anteaya	you are correct if it failed on the bug, it is highly unlikely it is the fix for it	21:50
fungi	heh	21:50
clarkb	jeblair: is that from a how to doc?	21:50
*** harlowja has quit IRC		21:50
clarkb	fungi: where are you running the gearman client?	21:50
fungi	clarkb: locally on jenkins-dev... should i not?	21:51
clarkb	fungi: I just did a netstat -ln and don't see a port 4730 listening. is zuul-dev stilla thing I bet that is where we need to run it	21:51
fungi	clarkb: should be on 127.0.0.1	21:51
clarkb	fungi: right I think zuul-dev is running the gearman server that jenkins-dev is connected to	21:52
fungi	clarkb: but was just going to surmise we might need one. i believe the jenkins-gearman plugin is going to refuse to activate if it can't connect to a gearman server	21:52
jeblair	clarkb: it's from a 283 page non-free book. :(	21:52
*** jcooley_ has joined #openstack-infra		21:52
clarkb	fungi: yup looks like zuul-dev. I would give your command a shot there	21:52
fungi	clarkb: aha. zuul-dev does exist. i'll try there	21:52
fungi	i just found it as well	21:53
clarkb	jeblair: you'll just have to explain everything then :)	21:53
*** SergeyLukjanov_ has quit IRC		21:53
jeblair	clarkb: it's distributed with the project. i dunno what the licensing deal is with the book. fortunately, the software is clear. ;)	21:54
*** vkozhukalov has joined #openstack-infra		21:54
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Process logs with CRM114 https://review.openstack.org/62096	21:54
jeblair	requisite pep8 fix ^	21:55
clarkb	jeblair: oh I see, the book is available just not free	21:55
fungi	clarkb: oh, after the jenkins-dev restart, nodepool deleted those slaves so it'll be a bit before new ones are enrolled	21:56
*** jcooley_ has quit IRC		21:57
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Process logs with CRM114 https://review.openstack.org/62096	21:57
clarkb	fungi: was it supposed to delete them?	21:57
fungi	clarkb: dunno, but the age on them is about right	21:57
*** CaptTofu has joined #openstack-infra		21:58
clarkb	fungi: that seems odd to me, but can probably be ignored for now	21:58
fungi	aha, i think it may be having trouble reconnecting to the gearman plugin on jenkins-dev	21:58
anteaya	markmcclain feels that this patch: https://review.openstack.org/#/c/62098/ I reversion of a rpc patch may address bug 1253896	21:58
uvirtbot	Launchpad bug 1253896 in tempest "Attempts to verify guests are running via SSH fails. SSH connection to guest does not work." [Critical,Confirmed] https://launchpad.net/bugs/1253896	21:58
anteaya	any chance of it getting a priority in the check queue?	21:59
fungi	clarkb: anyway, i'm being paged to go out to dinner now, so i'll have to continue this once i return	21:59
fungi	but i'll restart the nodepoold first	21:59
*** mfer has quit IRC		21:59
*** harlowja has joined #openstack-infra		22:00
openstackgerrit	Elizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add 2 new ci publication branches to gerritbot https://review.openstack.org/62095	22:00
pleia2	sneaky whitespace	22:00
*** sarob has joined #openstack-infra		22:00
clarkb	fungi: ok, I can try triggering the job by hand over on zuul-dev	22:01
*** esker has joined #openstack-infra		22:01
*** jcooley_ has joined #openstack-infra		22:01
*** rcarrillocruz has quit IRC		22:02
fungi	clarkb: what i was going to run is... (in ~/zuul with . venv/bin/activate) ./tools/trigger-job.py --job gate-tempest-devstack-vm-full --project openstack/nova --pipeline gate --newrev 1436c1707a127dc82136b1046934c8a56b558a0a --refname refs/zuul/master/Z2d5c1f5108fa490a8971e381fd423a09 --logpath 28/61428/2/gate/gate-tempest-devstack-vm-full/7e1d10e,2	22:02
fungi	note that the zuul on zuul-dev is too old to have trigger-job.py	22:02
fungi	so it may also be too old to support it, not sure yet	22:02
fungi	oh, and i just realized i didn't modify it to support all the additional parameters a gate job would want like i did the copy i was initially testing on jenkins-dev	22:03
*** praneshp has quit IRC		22:03
fungi	but if you want it, it's in the same place in my homedir there	22:03
jeblair	fungi: trigger-job doesn't affect zuul, it goes straight to the worker	22:03
clarkb	fungi: ok thanks	22:04
*** esker has quit IRC		22:04
fungi	jeblair: right, okay then it should be fine	22:04
* fungi vanishes		22:04
openstackgerrit	A change was merged to openstack-infra/statusbot: Set world-readable permissions on alert file https://review.openstack.org/61588	22:05
jeblair	clarkb: let me know if you need anything	22:06
clarkb	jeblair: will do, btw looking at the crm114 change I like how simple the actual mechanics of it are. Next step is CRM114 as a service? :)	22:07
*** resker has joined #openstack-infra		22:07
clarkb	currently waiting for nodepool to spin up two slaves that I can trigger jobs against	22:07
clarkb	it is running a job \o/	22:08
clarkb	I didn't have to do anything	22:08
jeblair	clarkb: heh :) note there's a level there too -- we can disable the filter by removing it from the config yaml	22:08
jeblair	s/level/lever/	22:08
*** praneshp has joined #openstack-infra		22:08
clarkb	slave 19 is running a devstack job	22:09
hemanth_	Hi, can anyone help me with this? http://logs.openstack.org/14/59814/8/check/gate-tempest-dsvm-neutron-large-ops/ee6bfe0/console.html	22:09
hemanth_	not really sure what that means	22:09
clarkb	hemanth_: http://logs.openstack.org/14/59814/8/check/gate-tempest-dsvm-neutron-large-ops/ee6bfe0/logs/screen-g-api.txt.gz notice in the console log it was attempting to start glance when it failed	22:10
*** thomasem has quit IRC		22:11
hemanth_	clarkb: oops, thanks so much for pointing it!	22:13
clarkb	jeblair: https://jenkins-dev.openstack.org/job/gate-tempest-devstack-vm-full/7896/console do we actually expect those jobs to run successfully? I think it may be too old	22:14
clarkb	jeblair: but the job did run and nodepool put the slave into the delete state	22:14
clarkb	and removed it from jenkins	22:14
*** openstackstatus has quit IRC		22:14
*** openstackstatus_ has joined #openstack-infra		22:14
clarkb	slave is now completely gone from jenkins-dev	22:14
*** openstackstatus_ is now known as openstackstatus		22:14
*** resker has quit IRC		22:14
jeblair	clarkb: yeah, i think that ip might be an old machine i was running	22:15
jeblair	clarkb: long gone. so yeah, i wouldn't worry about the jobs themselves, just the mechanics around them.	22:15
clarkb	jeblair: yeah the SCP thing doesn't bother me	22:15
*** prad has quit IRC		22:15
clarkb	devstack stopping so quickly does bother me a bit	22:15
*** openstackstatus has quit IRC		22:15
*** openstackstatus_ has joined #openstack-infra		22:15
*** openstackstatus_ is now known as openstackstatus		22:16
clarkb	jeblair: anything else you think we should look at before planning some rolling upgrades?	22:16
jeblair	clarkb: it probably tried to fetch a zuul ref from prod	22:16
jeblair	clarkb: (new ZUUL_URL feature could help with that)	22:16
clarkb	old zuul refs maybe	22:16
*** harlowja has quit IRC		22:16
clarkb	oh from review.o.o?	22:16
jeblair	clarkb: no i mean i think the jobs are the same jobs as in production, so it tried to fetch from zuul.o.o not zuul-dev.o.o	22:16
clarkb	oh right	22:17
*** openstackstatus has quit IRC		22:17
*** openstackstatus_ has joined #openstack-infra		22:17
*** openstackstatus_ is now known as openstackstatus		22:17
jeblair	i'll go fix statusbot	22:17
*** openstackstatus has quit IRC		22:17
*** openstackstatus has joined #openstack-infra		22:18
*** openstackstatus has quit IRC		22:18
zaro	clarkb: https://issues.jenkins-ci.org/browse/JENKINS-21006	22:18
*** openstackstatus has joined #openstack-infra		22:18
jeblair	zaro: neat, thanks	22:19
jeblair	clarkb: do a jjb run? delete at least one job from the cache so it does something..	22:20
jeblair	clarkb: other than that, the only thing i would think is burn-in -- do we want to leave it running for a few days to see if leakes or explodes with nodepool annoying it all the time?	22:22
clarkb	jeblair: we can	22:22
clarkb	jeblair: looks like you ran JJB by hand on jenkins-dev. doing that now	22:23
*** harlowja has joined #openstack-infra		22:24
clarkb	jeblair: we don't have jjb running periodically out of a system location there, so I won't worry about cache and just apply all the jobs	22:24
jeblair	k	22:25
*** jcooley_ has quit IRC		22:25
*** esker has joined #openstack-infra		22:25
jeblair	clarkb: if you want to start with the rolling upgrades without burning in on -dev, that's fine. we do have 2 masters.	22:25
*** AlexF has joined #openstack-infra		22:25
*** jerryz has quit IRC		22:25
clarkb	jeblair: part of me wants to, the other part of me realizes the weekend is near	22:25
*** harlowja has quit IRC		22:27
openstackgerrit	A change was merged to openstack-infra/config: Fix serving alert json file on eavesdrop https://review.openstack.org/61593	22:28
*** harlowja has joined #openstack-infra		22:28
openstackgerrit	A change was merged to openstack-infra/config: Don't re-exec in check-dg-tempest-dsvm-full https://review.openstack.org/61569	22:29
*** resker has joined #openstack-infra		22:32
*** esker_ has joined #openstack-infra		22:33
*** esker has quit IRC		22:34
clarkb	JJB is creating a bunch of jobs, seems to be happy	22:35
clarkb	jeblair: maybe we upgrade jenkins.o.o today then do 01 and 02 monday?	22:35
*** vkozhukalov has quit IRC		22:35
clarkb	that will give us a bit more burn in on less active machines	22:35
*** denis_makogon_ has joined #openstack-infra		22:35
jeblair	clarkb: i'd rather do jenkins.o.o last since it's not HA	22:35
clarkb	oh good point	22:35
clarkb	reverting is relatively easy, I am very tempted to go ahead and try 01	22:36
*** resker has quit IRC		22:37
*** AlexF has quit IRC		22:37
*** dangers is now known as danger_fo_away		22:39
*** AlexF has joined #openstack-infra		22:40
*** weshay has quit IRC		22:40
*** jasond has quit IRC		22:42
*** paul-- has joined #openstack-infra		22:42
*** ryanpetrello has quit IRC		22:44
clarkb	jeblair: JJB seems to have been fine, no apparent errors	22:48
jeblair	clarkb: cool	22:48
clarkb	jeblair: how do you feel about upgrading 01 or 02 today? My only concern is I will be in CA early next week and may not have as much time to babysit then	22:49
*** CaptTofu has quit IRC		22:49
*** ^d has quit IRC		22:50
*** CaptTofu has joined #openstack-infra		22:50
*** esker_ has quit IRC		22:50
jeblair	clarkb: wfm	22:50
clarkb	ok putting 01 in shutdown mode now	22:52
*** rcleere has quit IRC		22:54
*** esker has joined #openstack-infra		22:57
*** dkliban has quit IRC		22:58
jeblair	clarkb: i'll be afk for a while, back in a bit	22:58
*** bpokorny has quit IRC		22:58
clarkb	jeblair: ok ping me when you are back, hopefully 01 will be quiet by then	23:01
*** mgagne has quit IRC		23:03
*** sarob has quit IRC		23:07
*** sarob has joined #openstack-infra		23:08
*** sarob has quit IRC		23:09
*** sarob has joined #openstack-infra		23:09
*** datsun180b has quit IRC		23:10
*** sarob has quit IRC		23:11
*** sarob has joined #openstack-infra		23:11
*** oubiwan__ has quit IRC		23:12
*** sarob has quit IRC		23:12
*** gyee has quit IRC		23:13
*** rcarrillocruz1 has joined #openstack-infra		23:13
*** sarob has joined #openstack-infra		23:13
*** rcarrillocruz2 has joined #openstack-infra		23:14
*** sarob has quit IRC		23:15
*** rcarrillocruz2 has quit IRC		23:15
*** sarob has joined #openstack-infra		23:15
*** AlexF has quit IRC		23:16
*** sarob has quit IRC		23:16
*** rcarrillocruz1 has quit IRC		23:17
*** sarob has joined #openstack-infra		23:18
*** rnirmal has quit IRC		23:20
*** fbo is now known as fbo_away		23:20
*** sarob has quit IRC		23:21
nikhil__	hi	23:21
clarkb	hello	23:21
*** sarob has joined #openstack-infra		23:21
nikhil__	hey clarkb	23:21
nikhil__	can you please help me figure out	23:21
nikhil__	if there's a typo in https://jenkins01.openstack.org/job/check-grenade-dsvm/2036/console ?	23:22
nikhil__	2013-12-13 22:51:35.733 \| [ERROR] ./grenade.sh:263 Failure in upgrade-glancwe	23:22
clarkb	looks like it	23:23
nikhil__	that is one of the jenkins runs	23:23
clarkb	git grep glancwe in the grenade repo will show you where	23:23
clarkb	right, but the typo is in grenade	23:23
nikhil__	oh, is that in the openstack-infra project?	23:23
*** sarob has quit IRC		23:24
clarkb	no	23:24
*** sarob_ has joined #openstack-infra		23:24
clarkb	it is an openstack-dev project like devstack	23:24
nikhil__	oh	23:24
anteaya	nikhil__: http://git.openstack.org/cgit/openstack-dev/grenade/tree/	23:24
clarkb	jeblair: fungi: jenkins01 will be idle any minute now. Let me know when at least one of you is around. Though I may go ahead and upgrade jenkins01 if I don't hear from you guys in a bit just for the sake of time	23:25
nikhil__	thanks clarkb anteaya , checking it out now	23:25
*** sarob_ has quit IRC		23:27
*** blamar has quit IRC		23:28
*** sarob has joined #openstack-infra		23:29
*** sarob has quit IRC		23:30
*** sarob has joined #openstack-infra		23:31
reed	sarob, to create a new list https://wiki.openstack.org/wiki/Community#Mailing_lists_in_local_languages	23:32
*** sarob has quit IRC		23:32
jeblair	clarkb: re	23:33
clarkb	jeblair: there is one job on 01 currently running on hpcloud region b. I think it has a couple more minutes	23:34
*** sarob has joined #openstack-infra		23:34
*** sarob has quit IRC		23:35
*** praneshp has quit IRC		23:36
*** sarob has joined #openstack-infra		23:36
fungi	okay, back... checking scrollback to see where we are	23:37
clarkb	fungi: jenkins-dev seemed happy with nodepool and jjb so I have put jenkins01 in shutdown mode, waiting on one job there before upgrading	23:37
clarkb	fungi: I will be in CA early next week so figured doing this now was beneficial despite being friday	23:38
fungi	yep, great!	23:38
fungi	so once jenkins-dev's nodepool built new slaves it picked up on the corrected node labels i guess?	23:38
clarkb	fungi: I guess, because the jobs started running	23:38
fungi	wondering whether the jenkins-gearman plugin uprgade had anything to do with tat	23:38
fungi	that	23:38
*** praneshp has joined #openstack-infra		23:39
clarkb	possibly, maybe it couldn't handle the job data being sent previously	23:39
fungi	so you didn't actually have to manually trigger any jobs at all i guess. too awesome	23:39
fungi	interestingly, jenkins-dev has one devstack slave which is already marked offline but is running a tempest job. slightly odd...	23:40
clarkb	fungi: the nodes get marked offline when they start the jobs	23:40
*** sarob has quit IRC		23:41
fungi	however, it also thinks that tempest job should only take a total of ~2 minutes	23:41
*** sarob has joined #openstack-infra		23:41
clarkb	fungi: yeah the job is failing, jeblair thinks it is trying to clone zuul refs from zuul.o.o and not zuul-dev	23:41
fungi	ahh, upload timeouts	23:41
clarkb	but the mechanics of add node, delete node, seem fine	23:41
clarkb	fungi: upload timeouts are because jeblair killed the scp endpoint	23:41
fungi	it probably is trying to clone from zuul.o.o	23:42
*** sarob has quit IRC		23:42
fungi	zuul-dev has too old of a zuul to pass the ZUUL_URLparameter	23:42
clarkb	this regionb slave is taking forver	23:43
*** sarob has joined #openstack-infra		23:43
clarkb	almost tempted to kill a job and leave a comment on the change apologizing	23:44
*** sarob has quit IRC		23:45
fungi	clarkb: assuming it's https://jenkins01.openstack.org/job/check-tempest-dsvm-full/2367/ the change already failed another dsvm job anyway	23:45
clarkb	thats the one	23:45
clarkb	ok I will just manually kill it	23:45
*** sarob has joined #openstack-infra		23:45
clarkb	fungi: want to leave the comment?	23:45
fungi	ot failed the postgres-full so it's getting a -1 from check regardless	23:45
fungi	sure	23:45
fungi	nova devs have grown a thick skin, i think ;)	23:46
clarkb	going to give nodepool a minute or so to try and cleanup that node	23:46
fungi	oh, and it's rustlebee's change anyway ;)	23:46
*** sarob has quit IRC		23:46
clarkb	let me know when you are ready for me to stop jenkins, do the upgrade and start it again	23:47
fungi	i should be nice to him, he did approve vulnerability fixes for me yesterday, after all	23:47
fungi	clarkb: go for it	23:47
clarkb	doing it now	23:47
clarkb	it is starting	23:48
*** sarob has joined #openstack-infra		23:48
*** sarob has quit IRC		23:49
*** sarob has joined #openstack-infra		23:50
clarkb	according to zuul it is running jobs, still waiting on the guithough	23:50
sdague	hmmm... it looks like the only we are finding new errors in logs is in grenade, which wasn't quite the intent of that job.	23:50
*** flaper87 is now known as flaper87\|afk		23:50
*** denis_makogon_ has quit IRC		23:51
sdague	I think it might be worth turning that off - https://review.openstack.org/#/c/62107/	23:51
*** esker has quit IRC		23:52
fungi	sdague: makes sense	23:52
*** esker has joined #openstack-infra		23:53
fungi	error checks against stable, particularly, are going to be myriad until the icehouse release, i expect	23:53
fungi	clarkb: jenkins01 looks happytimes	23:54
jeblair	ooh neat you can collapse the exceutor status box	23:54
clarkb	fungi: yup seems to be doing its job	23:54
*** sarob has quit IRC		23:54
jeblair	"master + 115 computers (7 of 8 executors)"	23:54
jeblair	no idea what "7 of 8 executors" means.	23:55
* fungi nods. +/- glyphs		23:55
fungi	dunno, but it's fancified	23:55
clarkb	jeblair: now, do you want to let 01 burn in?	23:55
*** sarob has joined #openstack-infra		23:56
jeblair	clarkb: yeah, i kinda do. see what it looks like after a few hours/days of thrashing	23:56
clarkb	wfm	23:56
fungi	also, once we're done upgrading 01 and 02 we should not forget poor jenkins.o.o	23:56
fungi	but the weekend (or at least a night of churning through the gate) should give us some idea	23:56
jeblair	clarkb: hopefully if something does go wrong, 02 will continue to keep things going	23:56
clarkb	I will do my best to make time to check in and help upgrade the others on Monday	23:57
*** esker has quit IRC		23:57
fungi	clarkb: where in ca (also, is that the state code or country code)?	23:57
clarkb	fungi: the state	23:58
fungi	i need to know whether to send sheriffs or mounties	23:58
clarkb	I will be in sunnyvale	23:58
jeblair	there are only 536 threads on 01 compared to 1,869 on 02	23:58
clarkb	jeblair: nice	23:58
fungi	oh, sounds work-related. apologies	23:58
clarkb	fungi: it is! but it is work related in a good way	23:58
fungi	get zaro a fresh laptop as a souvenir	23:58
clarkb	Should have lots of time to sit with AaronGr and go over all the things	23:59
fungi	nice	23:59
AaronGr	clarkb: exciting.	23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!