fungi | zuul seems to have restarted jobs from jenkins01, as far as i can tell from the status page | 00:00 |
---|---|---|
*** paul-- has joined #openstack-infra | 00:00 | |
fungi | is that new behavior, or am i seeing fairies? | 00:00 |
clarkb | yeah it is running a bunch of jobs | 00:01 |
fungi | i want to say the last time a jenkins master fell over we had jobs stuck in a running state in zuul until we restarted it | 00:01 |
*** oubiwann-lambda has quit IRC | 00:01 | |
fungi | (until we restarted zuul i mean) | 00:01 |
*** nicedice has quit IRC | 00:02 | |
clarkb | we need to make sure that the things that feed off of zmq reconnected | 00:02 |
fungi | nodepool needed a restart last time, right? | 00:02 |
clarkb | yeah and the logstash gearman client | 00:03 |
*** nicedice has joined #openstack-infra | 00:03 | |
clarkb | zmq is supposed to avoid these problems but ugh | 00:03 |
fungi | i can go punt it now | 00:03 |
clarkb | fungi: well we should bea ble to check if it is connected I think | 00:03 |
fungi | or do we want to wait and see if nodepool retains sanity this time? | 00:03 |
fungi | yeah, that | 00:03 |
*** UtahDave has quit IRC | 00:04 | |
jeblair | i'll check on nodepool | 00:04 |
*** pabelanger has joined #openstack-infra | 00:05 | |
fungi | looks like we're also down four precise nodes across the two masters | 00:06 |
fungi | i'll get them back up and working at some point this evening | 00:06 |
jeblair | nodepool is receiving zmq events from both masters, and is currently assigning nodes to both of them. | 00:07 |
jeblair | (btw, it correctly noted jenkins01 was down earlier and stopped trying to assign nodes to it) | 00:07 |
fungi | that is a significant improvement over last time | 00:07 |
clarkb | jeblair: woot | 00:08 |
*** mdenny has quit IRC | 00:08 | |
clarkb | logstash seems fine too | 00:08 |
*** pabelanger has quit IRC | 00:09 | |
clarkb | I think zmq reconnect mechanism works fine when the disappearance of the service is relatively short | 00:09 |
jeblair | and yeah, zuul should know to restart jobs if gearman fails, so that's (optimistically) expected behavior | 00:09 |
clarkb | it has problems when it is hours long | 00:09 |
*** blamar has quit IRC | 00:10 | |
clarkb | https://review.openstack.org/#/c/61321/1 is the change to fix nodepool install on jenkins-dev | 00:10 |
jeblair | clarkb: how did you notice jenkins01 died? | 00:10 |
openstackgerrit | A change was merged to openstack-infra/config: Enable patchset-created for #openstack-state-management channel https://review.openstack.org/61605 | 00:12 |
*** jgrimm has quit IRC | 00:13 | |
clarkb | jeblair: I was going to hold d-g nodes in hopes of debugging the tempest ssh test, and couldn't open the web UI for jenkins01 to find which nodes needed holding | 00:13 |
*** bnemec has quit IRC | 00:14 | |
anteaya | AaronGr: you still around? | 00:14 |
AaronGr | anteaya: hi, i am. | 00:15 |
fungi | clarkb: zaro: does someone still need to test-drive https://review.openstack.org/61321 on jenkins-dev to confirm it puppets successfully? | 00:15 |
jeblair | clarkb: neat. so basically we just lost half our infrastructure and it was pretty much only noticable by admins. that makes me happy. :) | 00:15 |
clarkb | fungi: probably, I have also -1'd it because I think it needs a few tweaks | 00:15 |
fungi | oh | 00:15 |
clarkb | zaro: can you address those comments? | 00:16 |
anteaya | AaronGr: to help you parse the above, the path of a commit is user -> Gerrit -> Zuul -> Jenkins job running on a node provided by node pool | 00:16 |
fungi | yes, looks like you just did that | 00:16 |
anteaya | AaronGr: that is simplified but a place to begin for now | 00:16 |
clarkb | jeblair: yup I think I caught it within a couple mintues of it happening | 00:16 |
anteaya | AaronGr: every new patch submitted follows that path for testing | 00:16 |
jeblair | clarkb: i think you did too, but i meant that it seems like we're making headway in increasing fault tolerance. | 00:17 |
clarkb | yup | 00:17 |
AaronGr | anteaya: gerrit = review, zuul = scheduler for jenkins tasks? | 00:17 |
*** bnemec has joined #openstack-infra | 00:18 | |
anteaya | AaronGr: gerrit == review | 00:18 |
anteaya | not sure if I could call zuul a scheduler or not, but definitely a co-ordination layer between Gerrit and Jenkins jobs | 00:18 |
fungi | "scheduler" is not an inappropriate term for it | 00:19 |
*** dstanek has quit IRC | 00:20 | |
AaronGr | sorry, by scheduler i meant 'mechanism for submitting a task to run a jenkins job', in this case to validate a reviewed patch. | 00:20 |
fungi | in fact, http://ci.openstack.org/zuul/ starts out in its introduction, "The main component of Zuul is the scheduler." (so it's more than a scheduler, but that's a lot of it) | 00:20 |
AaronGr | assuming it comes back without failing, what's the next step? (i am taking the ci page line at a time, hadn't hit Z yet *grins*) | 00:21 |
anteaya | AaronGr: that can work for now, but as you understand more prepare to refine the definition | 00:21 |
clarkb | zaro: if you aren't able to make those chagnes I think I will quickly make them | 00:21 |
anteaya | logs get posted to the static logs server | 00:21 |
AaronGr | anteaya: absolutely, this is helping to balance out what i've been reading with a condensed set of steps. | 00:21 |
anteaya | zuul tells gerrit what to write on the comment on the patch, attributed to Jenkins | 00:22 |
anteaya | AaronGr: great | 00:22 |
anteaya | helps me to say it out loud too | 00:22 |
anteaya | learning by teaching | 00:22 |
AaronGr | so back to gerrit, and then it goes through someone who does the final merge? | 00:22 |
anteaya | so pass or fail, that is what happens | 00:23 |
AaronGr | (appreciated) | 00:23 |
fungi | anteaya: AaronGr: actually, http://docs.openstack.org/infra/publications/overview/ is a good high-level presentation on those topics | 00:23 |
anteaya | AaronGr: a person approves, the patch runs through the gate, if passed it is merged as part of the jenkins job | 00:23 |
anteaya | no human merges | 00:23 |
fungi | it's the "how we try to explain infrastructure to a room full of people in 30 minutes to an hour" | 00:23 |
AaronGr | fungi: ok, the explanations here help the pictures make sense. | 00:24 |
fungi | (complete with pretty pictures) | 00:24 |
*** openstackgerrit has quit IRC | 00:24 | |
fungi | ahh, cool, didn't know if you'd found the presentations yet | 00:24 |
jeblair | i think there are youtube videos of us giving that presentation. | 00:24 |
*** openstackgerrit has joined #openstack-infra | 00:24 | |
*** ekarlso has quit IRC | 00:25 | |
*** ekarlso has joined #openstack-infra | 00:25 | |
AaronGr | so code -> gerrit -> jenkins test -> gerrit -> jenkins -> codebase looks like the oneliner | 00:25 |
AaronGr | (assuming 2 levels of review and no bugs) | 00:25 |
anteaya | I think you have a basic understanding | 00:27 |
*** herndon has quit IRC | 00:27 | |
jeblair | clarkb: after only 4 training runs, i'm starting to get results like this from crm114: | 00:27 |
AaronGr | nice! | 00:27 |
jeblair | bad 1.0000 5.3426 2013-12-11 21:45:56.664 | Details: {u'conflictingRequest': {u'message': u"Cannot 'rebuild' while instance is in task_state rebuilding", u'code': 409}} | 00:27 |
fungi | neat! | 00:27 |
jeblair | the 1.0 is a rather high probability that line is associated with a failure | 00:28 |
*** reed has quit IRC | 00:28 | |
fungi | it's picking up quickly | 00:28 |
clarkb | jeblair: nice, would it be possible to make that an elasticsearch column? | 00:29 |
jeblair | fungi: knowing the answers ahead of time helps. :) and actually changes the problem a bit. | 00:29 |
clarkb | jeblair: right you can train on any job that was successful | 00:29 |
fungi | jeblair: well, true. it's more ham vs spam at that stage | 00:29 |
jeblair | clarkb: that's what i'm thinking | 00:29 |
clarkb | jeblair: I am thinking that if we have a numeric elasticsearch column with some probability then we can saerch based on that | 00:30 |
clarkb | lucene can do >=0.8 for example on numeric fields iirc | 00:30 |
fungi | oh, neat. yeah filter or sort on bayesian score | 00:30 |
lifeless | jeblair: ohh what are you training? | 00:30 |
*** ArxCruz has joined #openstack-infra | 00:31 | |
jeblair | lifeless: i'm seeing if crm114 can help identify log lines that indicate failure; it's something we talked about at the havana summit but needed logstash to exist first. | 00:32 |
anteaya | hey ArxCruz, I'm going to be sending people your way if they have questions about setting up their own infra | 00:32 |
anteaya | you and lifeless | 00:33 |
anteaya | hope that is okay with you | 00:33 |
ArxCruz | anteaya: sure, what's happening ? | 00:33 |
anteaya | ArxCruz: neutron needs all the plugin developers to provide 3rd party testing, there are a few of them | 00:33 |
*** oubiwan__ has joined #openstack-infra | 00:34 | |
anteaya | we suggested they set up their own infra using zuul, devstack-gate and nodepool to do so | 00:34 |
ArxCruz | anteaya: i know IBM is doing something I've been contacted from some colleagues | 00:34 |
anteaya | ArxCruz: cool | 00:34 |
anteaya | yes if IBM has a neutron plugin they will have to test it | 00:34 |
*** rcleere has quit IRC | 00:35 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321 | 00:35 |
clarkb | zaro: fungi ^ | 00:35 |
fungi | anteaya: i take it the gerrit jenkins plugin solution described on the third-party testing howto was insufficient for most of them? | 00:35 |
*** vipul is now known as vipul-away | 00:35 | |
*** vipul-away is now known as vipul | 00:35 | |
anteaya | fungi: I don't think that even came up | 00:37 |
anteaya | ArxCruz: fungi meeting logs: http://eavesdrop.openstack.org/meetings/networking_third_party_testing/2013/networking_third_party_testing.2013-12-12-17.00.log.html | 00:37 |
lifeless | jeblair: sweeet | 00:37 |
anteaya | etherpad: https://etherpad.openstack.org/p/multi-node-neutron-tempest | 00:37 |
fungi | anteaya: http://ci.openstack.org/third_party.html | 00:37 |
clarkb | fungi: there seems to be a large lack of reading prior art and a lot of what do we do | 00:37 |
* anteaya clicks | 00:37 | |
anteaya | clarkb: large lack | 00:38 |
lifeless | anteaya: so, I will answer questions as needed, but folk should come here firstly. | 00:38 |
lifeless | anteaya: this is the community forum for discussion | 00:38 |
anteaya | fungi: I hadn't read that before either, that looks so simple | 00:39 |
clarkb | jeblair: the more you say about it the mroe I am interested :) curious to know what your plan for piping the data through crm114 is and what the crm114 setup looks like (iirc crm114 can do several different types of filters) | 00:39 |
clarkb | jeblair: but don't let me distract you | 00:39 |
anteaya | fungi: so all they would need is their own Jenkins with this trigger plugin? | 00:39 |
anteaya | lifeless: understood | 00:40 |
fungi | anteaya: plus stuff for their jenkins to run, and a place to post their logs | 00:40 |
anteaya | I find it helpful to give them names, otherwise they tend to stay silent and just write it all themselves | 00:40 |
clarkb | fungi: a lot of them want to do things that don't lend well to the trigger plugin | 00:40 |
anteaya | fungi: right, so simple | 00:40 |
anteaya | that is true too | 00:41 |
clarkb | they want mutli node baremetal testing with single use environment and the ability for granular control over what events trigger specific jobs | 00:41 |
clarkb | tl;dr I really think they should look at zuul devstack-gate and nodepool | 00:41 |
anteaya | in any case they should be in here asking questions | 00:41 |
fungi | got it | 00:41 |
anteaya | so brace for onslaught | 00:41 |
anteaya | at least I hope they show up in here asking questions | 00:42 |
openstackgerrit | Tom Fifield proposed a change to openstack-infra/config: Add welcome_message.py to patchset-created trigger https://review.openstack.org/61898 | 00:42 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321 | 00:44 |
clarkb | that should fix the lint error I hope | 00:44 |
*** dstanek has joined #openstack-infra | 00:44 | |
*** sarob has quit IRC | 00:46 | |
*** sarob has joined #openstack-infra | 00:46 | |
*** dstanek has quit IRC | 00:49 | |
*** senk has joined #openstack-infra | 00:50 | |
*** sarob has quit IRC | 00:51 | |
*** vipul is now known as vipul-away | 00:51 | |
openstackgerrit | lifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now. https://review.openstack.org/61900 | 00:51 |
openstackgerrit | Tom Fifield proposed a change to openstack-infra/jeepyb: Add dryrun flag to welcome_message.py https://review.openstack.org/61901 | 00:52 |
openstackgerrit | Tom Fifield proposed a change to openstack-infra/config: Add welcome_message.py to patchset-created trigger https://review.openstack.org/61898 | 00:53 |
*** vipul-away is now known as vipul | 00:53 | |
clarkb | sdague: jeblair fungi https://bugs.launchpad.net/devstack/+bug/1253482 see my last comment there | 00:53 |
uvirtbot | Launchpad bug 1253482 in keystone "Keystone default port in linux local ephemeral port range. Devstack should shift range." [Undecided,In progress] | 00:53 |
*** senk has quit IRC | 00:54 | |
fungi | clarkb: good point | 00:55 |
*** mriedem1 has quit IRC | 00:55 | |
fungi | nodepool would have them marked as used but jenkins might have undone its marker for them being used single-use slaves? | 00:56 |
jeblair | fungi: yeah, the gearman plugin marks them as offline. i'm guessing a restart marks them all online again. | 00:56 |
jeblair | (so not actually a nodepool thing but rather a jenkins thing) | 00:57 |
fungi | next time we crash or even reboot a jenkins master, should we nodepool-delete all ephemeral slaves associated with it before starting again? | 00:57 |
clarkb | jeblair: oh right | 00:57 |
clarkb | fungi: all used slaves | 00:57 |
fungi | right, that | 00:57 |
*** praneshp has quit IRC | 00:57 | |
jeblair | i'm trying to think of something nodepool could do, but the bad behavior is that jenkins brings all previously known slaves online immediately... | 00:58 |
jeblair | i think perhaps once we get to all-dynamic slaves, we could probably write a quick script to remove all slaves from the config before (re-)starting jenkins | 00:58 |
fungi | so, like, here in a moment when we put jenkins02 into prepare-for-shutdown, we should clear used 02 slaves out once it quiesces | 00:58 |
clarkb | jeblair: right. is nodepool doing a temporary offline or a normal offline? | 00:58 |
jeblair | fungi: if you put it in shutdown, they should all go away on their own. | 00:58 |
fungi | ahh | 00:59 |
clarkb | er sorry gearman plugin | 00:59 |
fungi | so really only in the case of unanticipated jenkins failure | 00:59 |
jeblair | clarkb: i think there is only "offline" and "disconnect"; so i think it's just doing offline. disconnect would be problematic. if there's an offline that's more than offline, i'm not familiar with it. | 00:59 |
*** mrodden has quit IRC | 00:59 | |
jeblair | clarkb: (but this part of jenkins is kind of a mess, with internal terms not lining up at all with ui elements, etc) | 01:00 |
clarkb | jeblair: the gui button says "Mark this node temporarily offline" | 01:00 |
clarkb | I am guessing that lines up to offline and ya disconnect would be bad | 01:00 |
jeblair | clarkb: i believe that's what's going on. | 01:00 |
fungi | why is disconnect particularly bad? | 01:01 |
jeblair | anyway, it sounds like we can make an improvement soon. | 01:01 |
jeblair | fungi: it'll stop the running job | 01:01 |
fungi | oh | 01:02 |
fungi | yeah, that's bad. okay, important safety tip | 01:02 |
jeblair | fungi: heh. yeah, this happens immediately when a job starts so that there's no race condition with doing this when it finishes. | 01:02 |
*** ^demon|lunch has quit IRC | 01:02 | |
*** blamar has joined #openstack-infra | 01:02 | |
fungi | sort of the jenkins equivalent of total protonic reversal. got it | 01:03 |
clarkb | would be nice if we had temporary offline functionality that wasn't temporary | 01:03 |
jeblair | we could probably change the node label too, but that's a lot of extra work for gearman-plugin. | 01:03 |
jeblair | may be worth looking into though. | 01:04 |
clarkb | jeblair: and in the long run probably better putting the effort into making jenkins more reliable | 01:04 |
jeblair | (or more gone) | 01:04 |
clarkb | fungi: re https://review.openstack.org/#/c/61321/3 did you want to try applying that to nodepool.o.o and jenkins-dev? I can give it a shot tomorrow | 01:05 |
fungi | clarkb: doing now | 01:05 |
*** jhesketh_ has quit IRC | 01:05 | |
openstackgerrit | A change was merged to openstack-dev/hacking: Fix typos of comment in module core https://review.openstack.org/61111 | 01:06 |
*** jerryz__ has quit IRC | 01:06 | |
clarkb | fungi: I would --noop it first time just to make sure we didn't do anything silly :) | 01:06 |
fungi | yup | 01:07 |
anteaya | AaronGr: --noop is no op or no operation, it means a test which stands up a devstack on a node and returns true | 01:09 |
clarkb | anteaya: I think AaronGr is familiar with the puppets | 01:09 |
anteaya | AaronGr: it's main purpose that I know of is a placeholder for further tests | 01:09 |
clarkb | in fact I think we might be able to bug him about making puppet stuff better >_> | 01:09 |
anteaya | clarkb: ah okay, will look for him to teach me about the puppets | 01:09 |
anteaya | thank you | 01:09 |
fungi | anteaya: in this case it means i want puppet to pretend to apply new configuration but not actually do it | 01:10 |
fungi | (no op is a very common term in computing) | 01:10 |
anteaya | the pretending to apply configuration is always so satisfying | 01:10 |
anteaya | ah sorry, my mistake then was new for me | 01:10 |
fungi | well, when it tells me what it's going to do without actually doing it (and then screwing things up), yes satisfying ;) | 01:10 |
anteaya | guess I am sharing the stuff I wish I knew | 01:11 |
anteaya | hehe | 01:11 |
AaronGr | anteaya: thankfully, one thing i am bringing with me is a bit of puppet, been actively using it for about a year (i run a puppetmaster in my house, for my home network) | 01:13 |
fungi | mmm, puppet agent was not even running on jenkins-dev. going to update it from production before i --noop | 01:13 |
anteaya | AaronGr: awesome, I will ask you many stupid puppet questions | 01:13 |
anteaya | get ready | 01:13 |
AaronGr | anteaya: fair trade, you're welcome to anything i know. have looked through about 40% of infra/config -- i saw at least 10 spots modules could get rewritten or refactored easily | 01:14 |
* StevenK waits for "Is the puppet made out of oak, pine or maple?" | 01:14 | |
AaronGr | plus some really cool places to use more hiera. | 01:14 |
anteaya | AaronGr: awesome | 01:14 |
fungi | ahh, i think the agent must have been left stopped while testing the previously-broken nodepool addition | 01:14 |
anteaya | hiera I don't understand at all, so if you do, power to you | 01:14 |
anteaya | AaronGr: I have a few infra bugs with my name on it that you might like, puppety stuff | 01:15 |
anteaya | AaronGr: been in #openstack-neutron since the summit, hard to wear two hats, at least for me | 01:15 |
*** mrodden has joined #openstack-infra | 01:15 | |
*** oubiwan__ has quit IRC | 01:15 | |
clarkb | fungi: ya that was probably me | 01:16 |
AaronGr | anteaya: i'll happily take them, though not until monday, when i get fully up to speed. after that, pour them on. | 01:16 |
*** mrodden has quit IRC | 01:16 | |
anteaya | AaronGr: fair enough | 01:16 |
fungi | clarkb: see review comment. jenkins_dev_api_user | 01:17 |
*** mrodden has joined #openstack-infra | 01:17 | |
*** ljjjustin has joined #openstack-infra | 01:18 | |
clarkb | fungi: looking | 01:19 |
fungi | clarkb: playing around with fixing it now. i think the template needs to just not use _dev on its vars | 01:19 |
clarkb | fungi: oh right, because we collapsed the variables in puppet | 01:20 |
fungi | yep, those three lines need fixing, but that's not all. new errors once i do | 01:20 |
clarkb | woo | 01:20 |
fungi | updated comments with the new errors | 01:23 |
fungi | though perhaps those are an artifact of --noop | 01:23 |
fungi | ? | 01:23 |
*** ryanpetrello has quit IRC | 01:24 | |
fungi | i can try dropping the --noop and seeing if it applies cleanly | 01:24 |
clarkb | ya those look like artifacts of the --noop | 01:24 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321 | 01:24 |
clarkb | fungi: ^ that removes the dev vars from the erb | 01:24 |
*** jhesketh has joined #openstack-infra | 01:25 | |
*** hogepodge has quit IRC | 01:25 | |
*** oubiwan__ has joined #openstack-infra | 01:26 | |
Alex_Gaynor | So the gate is about 12 hours behind real time. Is that entirely because of resets, or other causes? | 01:28 |
clarkb | Alex_Gaynor: mostly resets | 01:29 |
fungi | clarkb: yet still more new error comment | 01:29 |
clarkb | the sphinx thing and changes getting approved anyways really made it thrash yesterday | 01:29 |
*** syerrapragada1 has quit IRC | 01:29 | |
*** syerrapragada has joined #openstack-infra | 01:30 | |
clarkb | fungi: if you can pip install by hand does it work? | 01:30 |
*** syerrapragada has left #openstack-infra | 01:31 | |
*** praneshp has joined #openstack-infra | 01:31 | |
fungi | it may be missing dependencies for compiling libzmq | 01:31 |
clarkb | that could be | 01:31 |
fungi | ahh, yeah | 01:31 |
fungi | gcc: error trying to exec 'cc1plus': execvp: No such file or directory | 01:31 |
fungi | grah | 01:32 |
clarkb | don't we put build-essential everywhere? | 01:32 |
clarkb | fungi: curious why that wasn't a problem on nodepool.o.o | 01:33 |
fungi | Installed: (none) | 01:34 |
fungi | Candidate: 11.5ubuntu2.1 | 01:34 |
*** praneshp_ has joined #openstack-infra | 01:34 | |
fungi | as opposed to nodepool.o.o, Installed: 11.5ubuntu2.1 | 01:34 |
fungi | so, no, jenkins-dev has nothing telling it to install build-essential apparently | 01:35 |
*** weshay has quit IRC | 01:35 | |
*** praneshp has quit IRC | 01:36 | |
*** praneshp_ is now known as praneshp | 01:36 | |
clarkb | interesting | 01:36 |
*** syerrapragada1 has joined #openstack-infra | 01:36 | |
fungi | clarkb: also, do we still want to restart jenkins02? if so, i can go ahead and put it in shutdown now | 01:37 |
clarkb | I wonder if jeblair installed that by hand, git grep doesn't show it anywhere that nodepool.o.o would pick it up on | 01:37 |
clarkb | fungi: sure | 01:37 |
clarkb | fungi: I am adding build-essential to the nodepool module now | 01:37 |
fungi | i got precise23 and precise40 back online in jenkins, but precise5 and precise9 don't seem to want to relaunch the slave agent even after rebooting (and i'm able to ssh into them fine) | 01:38 |
fungi | jenkins02 is in prepare for shutdown now | 01:38 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321 | 01:38 |
clarkb | fungi: weird | 01:38 |
*** jhesketh__ has joined #openstack-infra | 01:38 | |
clarkb | try that ^ | 01:38 |
*** gyee has quit IRC | 01:38 | |
*** dims has quit IRC | 01:39 | |
*** syerrapragada1 has quit IRC | 01:39 | |
fungi | looks like we've got about 40 minutes to quiescence on jenkins02, based on most recently-started jobs | 01:39 |
fungi | clarkb: success | 01:44 |
clarkb | woot | 01:44 |
fungi | hrm, though nodepool's still not installed | 01:44 |
fungi | that's... no good | 01:44 |
clarkb | oh because the repo didn't refresh the installer? | 01:45 |
clarkb | you can probably just delete the repo and make it reclone | 01:45 |
*** dstanek has joined #openstack-infra | 01:45 | |
fungi | good call | 01:45 |
fungi | i thought it finished rather quickly on that run :( | 01:45 |
clarkb | we should just make everything stateless | 01:45 |
*** xchu has joined #openstack-infra | 01:46 | |
fungi | nodepool==3871acf | 01:47 |
*** sdake_ has quit IRC | 01:47 | |
fungi | much better | 01:47 |
*** sdake_ has joined #openstack-infra | 01:47 | |
*** sdake_ has quit IRC | 01:47 | |
*** sdake_ has joined #openstack-infra | 01:47 | |
fungi | and nodepool list works (though the list is of course empty at the moment) | 01:47 |
fungi | alien-list and alien-image-list return entries though, so auth is definitely sane | 01:48 |
clarkb | fungi: btw what was the process for getting the credential id? did you add a credential to jenkins-dev then go grab an id out of the xml? | 01:48 |
fungi | clarkb: yes | 01:49 |
*** dstanek has quit IRC | 01:50 | |
fungi | i figured out where to find it first by grep'ing the prod one out of jenkins01, then confirmed that jenkins-dev had none, then went into manage credentials and added one which matched the settings in the jenkins01 webui, then fished it out of the xml after that | 01:50 |
fungi | and bob's your uncle | 01:50 |
clarkb | fungi: I wonder if we need to change min-ready to 1 in the nodepool config | 01:51 |
fungi | probably | 01:51 |
clarkb | is nodepool building an image currently? | 01:51 |
fungi | it didn't seem to be when i looked, but i'll look again | 01:51 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: fix installation of nodepool on jenkins-dev https://review.openstack.org/61321 | 01:51 |
fungi | nope. image-list is still empty | 01:51 |
clarkb | that bumps the min-ready number | 01:51 |
fungi | testing | 01:51 |
clarkb | I bet that fixes the image-listing | 01:51 |
*** dims has joined #openstack-infra | 01:52 | |
*** senk has joined #openstack-infra | 01:52 | |
fungi | aha, no... nodepool daemon didn't start | 01:54 |
fungi | clarkb: once i *started* the nodepool initscript, it began to build an image | 01:56 |
fungi | did we skip an ensure => running? | 01:56 |
clarkb | fungi: possibly. I know jeblair isn't a fan of ensure => running | 01:57 |
fungi | yup, modules/nodepool/manifests/init.pp doesn't do it | 01:57 |
fungi | okay, mystery solved | 01:57 |
*** senk has quit IRC | 01:58 | |
*** dstanek has joined #openstack-infra | 01:58 | |
*** senk has joined #openstack-infra | 02:00 | |
*** AaronGr is now known as AaronGr_afk | 02:01 | |
*** CaptTofu has joined #openstack-infra | 02:03 | |
*** yongli has quit IRC | 02:05 | |
*** locke105 has joined #openstack-infra | 02:09 | |
*** senk has quit IRC | 02:10 | |
*** mrodden1 has joined #openstack-infra | 02:16 | |
*** WarrenUsui has quit IRC | 02:17 | |
*** sdake_ has quit IRC | 02:17 | |
*** WarrenUsui has joined #openstack-infra | 02:18 | |
*** mrodden has quit IRC | 02:18 | |
*** senk has joined #openstack-infra | 02:20 | |
*** senk has quit IRC | 02:24 | |
*** senk has joined #openstack-infra | 02:24 | |
*** senk1 has joined #openstack-infra | 02:28 | |
*** senk has quit IRC | 02:29 | |
*** yaguang has joined #openstack-infra | 02:29 | |
*** senk1 has quit IRC | 02:31 | |
*** mriedem has joined #openstack-infra | 02:31 | |
*** senk has joined #openstack-infra | 02:32 | |
*** reed has joined #openstack-infra | 02:36 | |
*** senk has quit IRC | 02:37 | |
*** CaptTofu_ has joined #openstack-infra | 02:42 | |
*** bingbu has joined #openstack-infra | 02:43 | |
*** CaptTofu has quit IRC | 02:44 | |
*** guohliu has joined #openstack-infra | 02:45 | |
*** sarob has joined #openstack-infra | 02:46 | |
*** SushilKM has joined #openstack-infra | 02:57 | |
*** yongli has joined #openstack-infra | 02:58 | |
*** yamahata_ has quit IRC | 03:05 | |
*** beagles has quit IRC | 03:08 | |
*** b3nt_pin has joined #openstack-infra | 03:09 | |
*** b3nt_pin is now known as beagles | 03:09 | |
*** mestery_ has joined #openstack-infra | 03:11 | |
*** mestery has quit IRC | 03:14 | |
*** sdake_ has joined #openstack-infra | 03:16 | |
*** pcrews has quit IRC | 03:16 | |
*** dkliban has quit IRC | 03:18 | |
mordred | jeblair: I support your crm114 efforts. that's really f-ing cool | 03:22 |
clarkb | mordred: it is incredibly cool. I will owe jeblair lots of alcohol I bet | 03:23 |
mordred | clarkb: ++ | 03:23 |
clarkb | mordred: one of the things I have been incredibly happy about the whole logstash elasticsearch thing is that it has enabled folks to hack on it in simple ways without needing too many crazy workarounds for eg logs behind apache | 03:25 |
clarkb | I think that portion of the system has turned out well. It isn't all perfect though. A lot of the data could be modeled with relations and we don't have that | 03:25 |
mordred | clarkb: yah. it's one of the coolest things ever | 03:25 |
mordred | clarkb: I think it also goes to show the power to logging in sane ways | 03:25 |
anteaya | mordred: what country are you in? | 03:27 |
anteaya | good thing you aren't a micro manager, I haven't talked to you in a month | 03:28 |
StevenK | Last I heard, it was .es, but that could have changed | 03:28 |
mordred | anteaya: spain. flying out in a few hours | 03:28 |
mordred | StevenK: I see you've already started playing everyone's favorite game "Where in the world is mordred?" | 03:28 |
mordred | anteaya: that's right - would you like me to micro-manage more? | 03:29 |
mordred | anteaya: go do things! | 03:29 |
mordred | that's all I've got | 03:29 |
clarkb | ha | 03:29 |
clarkb | even when he tries he isn't able :) | 03:29 |
*** dkliban has joined #openstack-infra | 03:30 | |
anteaya | mordred: great, whew thanks for that | 03:30 |
anteaya | I feel better know | 03:30 |
anteaya | now | 03:30 |
mordred | clarkb: do - uhm - differnet things. perhaps scrumming something is a good choice? | 03:30 |
mordred | clarkb: or kanban. definitely you should kanban something | 03:30 |
clarkb | mordred: got it | 03:30 |
mordred | phew | 03:30 |
* mordred wins | 03:30 | |
clarkb | fungi: mordred wants us to put up a board with post its. do you have room in your lab? | 03:31 |
* mordred has kanbanned his employees employing scrum methodology | 03:31 | |
clarkb | fungi: then we can build a robot to move things around for us | 03:31 |
mordred | clarkb: only if the robot speaks japanese | 03:31 |
*** AlexF has joined #openstack-infra | 03:32 | |
fungi | post-it robot, got it | 03:33 |
fungi | tomorrow maybe | 03:33 |
fungi | mordred: agile something something/ | 03:33 |
StevenK | Agile Robot-Driven Development ? | 03:34 |
fungi | (...kill all humans...) | 03:35 |
fungi | yes | 03:36 |
clarkb | more evidence that everyone from NC is a robot | 03:36 |
anteaya | well at least fungi is online a lot | 03:36 |
anteaya | and mostly gives kind answers | 03:37 |
anteaya | who am I do judge his robot internals | 03:37 |
StevenK | Haha | 03:37 |
*** changbl has joined #openstack-infra | 03:37 | |
*** ryanpetrello has joined #openstack-infra | 03:37 | |
fungi | in the south we say "rowbut" | 03:37 |
clarkb | fungi: like zoidberg | 03:37 |
StevenK | Zoidberg is more 'robbit' | 03:37 |
anteaya | oh I like rowbut | 03:38 |
fungi | newqular rowbuts | 03:38 |
anteaya | robbit rabbit hobbit | 03:38 |
anteaya | ha ha ha | 03:38 |
*** mestery has joined #openstack-infra | 03:39 | |
*** ArxCruz has quit IRC | 03:41 | |
*** mestery_ has quit IRC | 03:42 | |
*** AlexF has quit IRC | 03:45 | |
*** jhesketh__ has quit IRC | 03:52 | |
*** mriedem has quit IRC | 03:52 | |
*** jhesketh__ has joined #openstack-infra | 03:53 | |
*** pabelanger_ has joined #openstack-infra | 03:56 | |
*** weshay has joined #openstack-infra | 03:59 | |
*** sarob has quit IRC | 04:01 | |
*** sarob has joined #openstack-infra | 04:01 | |
*** krtaylor has joined #openstack-infra | 04:02 | |
*** pabelanger_ has quit IRC | 04:02 | |
*** sarob has quit IRC | 04:06 | |
*** sarob has joined #openstack-infra | 04:06 | |
*** sarob has quit IRC | 04:11 | |
*** AaronGr_afk is now known as AaronGr | 04:13 | |
*** AaronGr is now known as AaronGr_Zzz | 04:13 | |
*** SushilKM has quit IRC | 04:15 | |
*** SushilKM has joined #openstack-infra | 04:17 | |
*** jcooley_ has joined #openstack-infra | 04:17 | |
*** SushilKM has quit IRC | 04:20 | |
*** SushilKM has joined #openstack-infra | 04:21 | |
*** sharwell has quit IRC | 04:22 | |
*** pabelanger_ has joined #openstack-infra | 04:22 | |
*** CaptTofu_ has quit IRC | 04:25 | |
*** SushilKM has quit IRC | 04:25 | |
*** CaptTofu has joined #openstack-infra | 04:25 | |
*** pabelanger__ has joined #openstack-infra | 04:27 | |
*** pabelanger__ has quit IRC | 04:27 | |
*** guohliu has quit IRC | 04:29 | |
*** CaptTofu has quit IRC | 04:30 | |
*** esker has joined #openstack-infra | 04:34 | |
*** dkliban has quit IRC | 04:40 | |
*** guohliu has joined #openstack-infra | 04:42 | |
*** dkliban has joined #openstack-infra | 04:43 | |
*** pabelanger_ has quit IRC | 04:44 | |
*** boris-42 has joined #openstack-infra | 04:49 | |
*** dkliban has quit IRC | 05:00 | |
*** sarob has joined #openstack-infra | 05:05 | |
openstackgerrit | Matthew Treinish proposed a change to openstack-infra/devstack-gate: Up the default concurrency on tempest runs https://review.openstack.org/58605 | 05:06 |
*** jcooley_ has quit IRC | 05:06 | |
*** sarob has quit IRC | 05:09 | |
*** sickboy3i has joined #openstack-infra | 05:10 | |
*** guohliu has quit IRC | 05:11 | |
*** sickboy3i has quit IRC | 05:11 | |
*** ryanpetrello has quit IRC | 05:13 | |
*** dstanek has quit IRC | 05:13 | |
*** jcooley_ has joined #openstack-infra | 05:15 | |
*** ryanpetrello has joined #openstack-infra | 05:19 | |
*** ljjjusti1 has joined #openstack-infra | 05:20 | |
*** weshay has quit IRC | 05:21 | |
*** ljjjustin has quit IRC | 05:21 | |
*** guohliu has joined #openstack-infra | 05:22 | |
*** SergeyLukjanov has joined #openstack-infra | 05:27 | |
*** dstanek has joined #openstack-infra | 05:29 | |
*** vkozhukalov has joined #openstack-infra | 05:30 | |
*** nicedice has quit IRC | 05:35 | |
*** reed has quit IRC | 05:36 | |
*** basha has joined #openstack-infra | 05:38 | |
basha | Hi, anyone around? | 05:38 |
clarkb | basha: sort of, whats up? | 05:38 |
basha | facing a small issue with a patch | 05:38 |
basha | clarkb: https://review.openstack.org/#/c/60188/1 | 05:38 |
basha | The jenkins seems to pass. | 05:39 |
basha | but when I look at the logs for python26/27 | 05:39 |
basha | it seems a lil weird | 05:39 |
basha | clarkb: ^^ | 05:39 |
*** Abhishek has joined #openstack-infra | 05:39 | |
*** talluri has joined #openstack-infra | 05:40 | |
clarkb | basha: I see hte logs and exceptions | 05:40 |
clarkb | but nose is reporting that the tests pass | 05:40 |
basha | isnt it lil weird clarkb ? Does this happpen often? | 05:41 |
basha | btw whats hte logs? | 05:41 |
basha | :D | 05:41 |
clarkb | basha: I have no idea, that would be questions for glance | 05:41 |
zaro | clarkb: trying to use macbook, i'm sucking :( | 05:41 |
clarkb | zaro: I'm sorry, I can't help you with the aluminum blocks | 05:42 |
basha | clarkb: have u seen this happen before? | 05:42 |
clarkb | basha: infra runs the tests, we aren't typically very good at answering questions about test weirdness | 05:42 |
basha | clarkb: zaro : macs rock!! :P | 05:42 |
*** sdake_ has quit IRC | 05:42 | |
clarkb | the tests themselves fall under the responsibility of the project and the project itself would be most familiar | 05:43 |
*** dstanek has quit IRC | 05:43 | |
basha | clarkb: OK. I was just a lil puzzled that jenkins went green, but the logs seemed to be weird | 05:43 |
zaro | basha:i'm newbie. shortcut keys don't work same on weechat. | 05:43 |
clarkb | I would expect the exception at http://logs.openstack.org/88/60188/6/check/gate-glance-python27/bf13e3b/console.html#_2013-12-12_14_46_49_807 to cause the test to fail but nose doesn't agree with me | 05:43 |
clarkb | zaro: is this a loaner? | 05:43 |
zaro | clarkb: hopefully, but might be perm | 05:44 |
*** Abhishek has quit IRC | 05:44 | |
zaro | clarkb: tara says she's gonna try to get same hp again but she says it's unlikely | 05:44 |
basha | zaro: http://support.apple.com/kb/ht1343 | 05:44 |
clarkb | zaro: :( | 05:44 |
basha | hope that helps :P | 05:44 |
clarkb | basha: jenkins is just looking at the exit code of nose | 05:45 |
basha | clarkb: I've seen that fail before. | 05:45 |
zaro | http://support.apple.com/kb/ht1343 | 05:45 |
zaro | 05:37:47 clarkb | zaro: :( | 05:45 |
clarkb | basha: if nose reports success jenkins reports success, and nose is clearly reporting success | 05:45 |
zaro | gah! | 05:45 |
basha | clarkb: I guess thats an ignored test perhaps | 05:45 |
clarkb | basha: could also be a nose bug, nose is not the greatest test runner around | 05:46 |
clarkb | or a test bug | 05:46 |
basha | clarkb: I see migrations running, which I havnt seen in unit tests happening | 05:46 |
clarkb | basha: I think the DB migration tests depend on having a mysql and or postgres server laying around configured properly | 05:46 |
zaro | clarkb: my only other option was to get one of those bricks. | 05:47 |
*** dstanek has joined #openstack-infra | 05:47 | |
basha | clarkb: hmmmmm…. l look into this in a bit more detail and let u know :) | 05:47 |
clarkb | zaro: I would've gotten a brick :) | 05:47 |
basha | clarkb: thanks a lot | 05:47 |
zaro | basha: that did not help. trying to figure out why alt-j doesn't work in weechat. | 05:48 |
zaro | clarkb: are you kidding me? that thing is like 10 lbs. | 05:48 |
basha | zaro: I dont use weechat :D | 05:48 |
basha | :P | 05:48 |
clarkb | zaro: I wouldn't carry it anywhere | 05:50 |
clarkb | but at least I would have a useable machine at my desk | 05:50 |
clarkb | basha: http://logstash.openstack.org/#eyJzZWFyY2giOiIgYnVpbGRfbmFtZTpnYXRlLWdsYW5jZS1weXRob24yKiBBTkQgbWVzc2FnZTpcIk9wZXJhdGlvbmFsRXJyb3JcIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiODY0MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIiwic3RhbXAiOjEzODY5MTM4MTE2NTl9 looks like that exception | 05:50 |
clarkb | happens quite a bit | 05:50 |
*** yamahata_ has joined #openstack-infra | 05:51 | |
basha | clarkb: yeah. I've seen it break couple of times but the nose still passes. | 05:52 |
zaro | clarkb: my broken laptop can work as a desktop. | 05:53 |
*** jcooley_ has quit IRC | 05:54 | |
*** jcooley_ has joined #openstack-infra | 05:54 | |
clarkb | zaro: oh its just the display that is bad? I bet you could replace it | 05:57 |
zaro | clarkb: you got any experience with that? | 05:58 |
clarkb | sort of, you need to find a replacement display thta is compatible, then when you do the teardown document everything otherwise it doesn't go back together | 05:59 |
*** jcooley_ has quit IRC | 05:59 | |
zaro | display is probably 80% of cost anyway. | 06:00 |
clarkb | not really those laptops have crappy cheap displays | 06:00 |
clarkb | the cpu and related peripherals are typically the costly bits | 06:00 |
zaro | yeah, they do. | 06:01 |
zaro | these things look pretty sealed. probably need special tools or something. | 06:01 |
*** jcooley_ has joined #openstack-infra | 06:02 | |
zaro | can't even scrollback on this mac | 06:03 |
*** dstanek has quit IRC | 06:03 | |
*** jcooley_ has quit IRC | 06:04 | |
*** jcooley_ has joined #openstack-infra | 06:04 | |
clarkb | https://www.laptopscreen.com/English/model/HP-Compaq/ELITEBOOK~FOLIO~9470M/ is the part | 06:05 |
*** sarob has joined #openstack-infra | 06:06 | |
*** Abhishek_ has joined #openstack-infra | 06:08 | |
zaro | how come it looks so easy on that image? i don't even see the screes on the display. | 06:08 |
clarkb | they always make it look easy :) | 06:10 |
clarkb | "flex the inside edges of the bottom edge (1), the left and right sides (2), and the top edge (3) of the display bezel until the display bezel disengages from the display enclosure" | 06:12 |
zaro | hey maybe it's the type of shell. mac default shell is xterm. what is it on ubuntu? | 06:12 |
clarkb | zaro: you were using konsole which probably presents itself as an xterm | 06:12 |
clarkb | swapping the display doesn't look too bad if you can pop the bezel off | 06:13 |
zaro | dang it! page up on mac scrolls the screen, not the backsrcroll | 06:13 |
zaro | pretty good pxd/hous game on tnt | 06:14 |
zaro | your right about aldridge, he da man. | 06:16 |
clarkb | and batum and lillard and matthews | 06:18 |
openstackgerrit | lifeless proposed a change to openstack-infra/reviewstats: Pin Sphinx. https://review.openstack.org/61921 | 06:19 |
openstackgerrit | lifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now. https://review.openstack.org/61900 | 06:19 |
*** jcooley_ has quit IRC | 06:19 | |
zaro | ok. i'm done mucking with this for tonight. good night. | 06:20 |
clarkb | night | 06:20 |
*** denis_makogon has joined #openstack-infra | 06:20 | |
*** vkozhukalov has quit IRC | 06:21 | |
*** slong_ has quit IRC | 06:24 | |
*** ryanpetrello has quit IRC | 06:28 | |
*** SushilKM has joined #openstack-infra | 06:39 | |
*** vogxn has joined #openstack-infra | 06:41 | |
*** basha has quit IRC | 06:42 | |
*** bingbu has quit IRC | 06:50 | |
*** sarob has quit IRC | 06:54 | |
*** basha has joined #openstack-infra | 06:54 | |
*** sarob has joined #openstack-infra | 06:55 | |
openstackgerrit | A change was merged to openstack/requirements: Add oslo.rootwrap to global requirements https://review.openstack.org/61738 | 06:59 |
*** sarob has quit IRC | 06:59 | |
*** NikitaKonovalov has joined #openstack-infra | 07:03 | |
*** basha has quit IRC | 07:05 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 07:05 | |
*** bingbu has joined #openstack-infra | 07:06 | |
*** _SergeyLukjanov has quit IRC | 07:06 | |
openstackgerrit | lifeless proposed a change to openstack-infra/reviewstats: Pin Sphinx. https://review.openstack.org/61921 | 07:14 |
openstackgerrit | lifeless proposed a change to openstack-infra/reviewstats: Ghe is tripleo-core now. https://review.openstack.org/61900 | 07:14 |
*** SergeyLukjanov has joined #openstack-infra | 07:19 | |
*** sarob has joined #openstack-infra | 07:25 | |
*** yolanda has joined #openstack-infra | 07:28 | |
*** dstanek has joined #openstack-infra | 07:30 | |
*** basha has joined #openstack-infra | 07:31 | |
*** basha has quit IRC | 07:31 | |
*** rcarrillocruz has joined #openstack-infra | 07:33 | |
*** dstanek has quit IRC | 07:35 | |
*** senk has joined #openstack-infra | 07:40 | |
*** bingbu has quit IRC | 07:41 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 07:44 | |
*** _SergeyLukjanov has quit IRC | 07:45 | |
*** sergmelikyan has joined #openstack-infra | 07:46 | |
sergmelikyan | >>/msg chanserv access #murano add openstackinfra +AFRfiorstv | 07:46 |
*** Abhishek_ has quit IRC | 07:46 | |
sergmelikyan | Why is bot require such privileges? | 07:46 |
sergmelikyan | And are they required to merge https://review.openstack.org/61703? | 07:48 |
*** andreaf has joined #openstack-infra | 07:51 | |
*** vkozhukalov has joined #openstack-infra | 07:52 | |
*** oubiwan__ has quit IRC | 07:53 | |
*** dizquierdo has joined #openstack-infra | 07:54 | |
*** jcoufal has joined #openstack-infra | 07:55 | |
*** sarob has quit IRC | 07:57 | |
*** vkozhukalov has quit IRC | 08:02 | |
openstackgerrit | A change was merged to openstack-infra/devstack-gate: Adding an option to use qpid instead of rabbit or zeromq https://review.openstack.org/55829 | 08:04 |
*** flaper87|afk is now known as flaper87 | 08:06 | |
*** vogxn1 has joined #openstack-infra | 08:06 | |
*** vogxn has quit IRC | 08:08 | |
*** praneshp has quit IRC | 08:11 | |
*** vogxn1 has quit IRC | 08:11 | |
*** SergeyLukjanov has joined #openstack-infra | 08:12 | |
*** bingbu has joined #openstack-infra | 08:13 | |
*** vkozhukalov has joined #openstack-infra | 08:14 | |
*** praneshp has joined #openstack-infra | 08:16 | |
*** basha has joined #openstack-infra | 08:18 | |
*** nprivalova has joined #openstack-infra | 08:23 | |
*** denis_makogon has quit IRC | 08:25 | |
*** rcarrillocruz1 has joined #openstack-infra | 08:26 | |
*** sarob has joined #openstack-infra | 08:26 | |
*** rcarrillocruz has quit IRC | 08:28 | |
*** praneshp has quit IRC | 08:29 | |
*** rongze has joined #openstack-infra | 08:29 | |
*** senk has quit IRC | 08:30 | |
*** xchu has quit IRC | 08:31 | |
*** sarob has quit IRC | 08:34 | |
*** iv_m has joined #openstack-infra | 08:38 | |
*** bingbu has quit IRC | 08:38 | |
*** salv-orlando has joined #openstack-infra | 08:39 | |
*** jpich has joined #openstack-infra | 08:41 | |
*** sHellUx has joined #openstack-infra | 08:45 | |
SergeyLukjanov | fungi, mordred, clarkb, jeblair, hey guys | 08:45 |
SergeyLukjanov | Queue lengths: 245 events, 382 results | 08:45 |
SergeyLukjanov | ^^ in zuul, looks not very good | 08:46 |
SergeyLukjanov | many of jobs are failing with https://jenkins02.openstack.org/job/gate-cinder-docs/3172/console | 08:48 |
*** afazekas has joined #openstack-infra | 08:48 | |
*** yongli has quit IRC | 08:48 | |
*** dizquierdo has quit IRC | 08:49 | |
*** bingbu has joined #openstack-infra | 08:51 | |
*** nosnos has joined #openstack-infra | 08:55 | |
*** apevec has joined #openstack-infra | 08:58 | |
*** apevec has joined #openstack-infra | 08:58 | |
*** yassine has joined #openstack-infra | 08:58 | |
*** yassine has quit IRC | 09:00 | |
*** yassine has joined #openstack-infra | 09:00 | |
*** yassine has quit IRC | 09:02 | |
apevec | java.io.IOException: Remote call on precise14 failed - is that broken Jenkins slave ? | 09:03 |
apevec | http://logs.openstack.org/32/61532/1/gate/gate-heat-python27/5d7c9dc/console.html | 09:03 |
apevec | that failed reverification of 61532 which blocks Heat CVE fixes on stable/havana :( | 09:04 |
*** yamahata_ has quit IRC | 09:04 | |
*** rongze has quit IRC | 09:05 | |
*** yassine has joined #openstack-infra | 09:06 | |
*** yassine has quit IRC | 09:06 | |
openstackgerrit | Ruslan Kamaldinov proposed a change to openstack-infra/config: Add jenkins03, jenkins04 to cacti https://review.openstack.org/61938 | 09:07 |
*** yassine has joined #openstack-infra | 09:07 | |
openstackgerrit | Abhishek Chanda proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1249889 https://review.openstack.org/61939 | 09:10 |
uvirtbot | Launchpad bug 1249889 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern[compute,image,volume] failed" [Undecided,Invalid] https://launchpad.net/bugs/1249889 | 09:10 |
*** kruskakli has left #openstack-infra | 09:12 | |
*** Abhishek_ has joined #openstack-infra | 09:14 | |
*** derekh has joined #openstack-infra | 09:16 | |
*** bingbu has quit IRC | 09:17 | |
*** SergeyLukjanov has quit IRC | 09:22 | |
*** sarob has joined #openstack-infra | 09:26 | |
*** jooools has joined #openstack-infra | 09:34 | |
*** rongze has joined #openstack-infra | 09:34 | |
*** sHellUx has quit IRC | 09:45 | |
*** hashar has joined #openstack-infra | 09:48 | |
*** zhiyan has joined #openstack-infra | 09:48 | |
*** rossella_s has joined #openstack-infra | 09:49 | |
*** nosnos has quit IRC | 09:51 | |
*** johnthetubaguy has joined #openstack-infra | 09:56 | |
*** sarob has quit IRC | 09:58 | |
*** saschpe_ has joined #openstack-infra | 10:01 | |
*** saschpe has quit IRC | 10:02 | |
*** ArxCruz has joined #openstack-infra | 10:06 | |
andreaf | hi - I'm working on a tempest change which has the following implication: listing servers requires tempest.conf to be available. gate-tempest-py27 is failing because tempest.conf is missing. Is it possible that the config file has not been generated yet when this check runs? I though devstack would create tempest.conf at setup. What am I missing? | 10:07 |
*** masayukig has quit IRC | 10:08 | |
*** basha has quit IRC | 10:08 | |
*** apevec has quit IRC | 10:09 | |
*** SergeyLukjanov has joined #openstack-infra | 10:10 | |
*** jhesketh__ has quit IRC | 10:12 | |
openstackgerrit | Alexandre Levine proposed a change to openstack-infra/config: Adding empty gce-api project to stackforge https://review.openstack.org/61954 | 10:13 |
*** dizquierdo has joined #openstack-infra | 10:16 | |
*** nprivalova has quit IRC | 10:16 | |
*** apevec has joined #openstack-infra | 10:21 | |
*** apevec has joined #openstack-infra | 10:22 | |
openstackgerrit | Alexandre Levine proposed a change to openstack-infra/config: Adding empty gce-api project to stackforge https://review.openstack.org/61954 | 10:22 |
*** sarob has joined #openstack-infra | 10:26 | |
*** guohliu has quit IRC | 10:29 | |
apevec | ttx, thanks for filing bug 1260654 that slave seems really broken: https://jenkins02.openstack.org/computer/precise14/builds | 10:31 |
uvirtbot | Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,New] https://launchpad.net/bugs/1260654 | 10:31 |
*** sarob has quit IRC | 10:31 | |
apevec | only gate-noop works (what does it do?) | 10:31 |
*** ArxCruz has quit IRC | 10:31 | |
apevec | can that machine be removed from the pool? | 10:31 |
*** dstanek has joined #openstack-infra | 10:33 | |
ttx | apevec: it can, but not by me | 10:33 |
*** flaper87 is now known as flaper87|afk | 10:33 | |
ttx | We don't have a good answer yet for borked slaves in european mornings | 10:33 |
apevec | ok, then it will be Russian roulette in the gate | 10:33 |
ttx | since the people with power to kill them are not up | 10:33 |
ttx | mordred, fungi ^ | 10:34 |
apevec | license to kill | 10:34 |
*** ljjjusti1 has quit IRC | 10:35 | |
*** dstanek has quit IRC | 10:37 | |
*** nprivalova has joined #openstack-infra | 10:40 | |
BobBall | we need a batphone | 10:42 |
*** chandankumar has quit IRC | 10:43 | |
chmouel | ttx: we are trying to find resource here at eNovance who can help infra during european times | 10:45 |
*** senk has joined #openstack-infra | 10:45 | |
*** chandankumar has joined #openstack-infra | 10:46 | |
*** senk has quit IRC | 10:48 | |
*** senk has joined #openstack-infra | 10:49 | |
openstackgerrit | Vadim Rovachev proposed a change to openstack-infra/devstack-gate: Added ceilometer-anotification to enabled services https://review.openstack.org/61958 | 10:53 |
*** senk has quit IRC | 10:53 | |
*** paul-- has quit IRC | 10:56 | |
*** sergmelikyan has quit IRC | 11:00 | |
*** paul-- has joined #openstack-infra | 11:04 | |
*** marun has joined #openstack-infra | 11:05 | |
apevec | more bad slaves, now precise20 https://jenkins02.openstack.org/job/gate-nova-python27/13176/console | 11:05 |
apevec | Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.tools.ant.Location | 11:06 |
apevec | looks like it lost some java packages?? | 11:06 |
*** rongze has quit IRC | 11:06 | |
*** lcestari has joined #openstack-infra | 11:09 | |
*** markmc has joined #openstack-infra | 11:19 | |
ttx | ew | 11:21 |
openstackgerrit | Darragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Use yaml local tags to support including files https://review.openstack.org/48783 | 11:26 |
*** sarob has joined #openstack-infra | 11:26 | |
*** marun has quit IRC | 11:29 | |
*** rongze has joined #openstack-infra | 11:38 | |
*** katyafervent has quit IRC | 11:45 | |
*** afazekas has quit IRC | 11:47 | |
*** rongze has quit IRC | 11:49 | |
*** zhiyan has quit IRC | 11:49 | |
*** sandy__ has quit IRC | 11:53 | |
*** sandy__ has joined #openstack-infra | 11:54 | |
*** sandy__ has quit IRC | 11:57 | |
*** sarob has quit IRC | 11:58 | |
*** jcoufal has quit IRC | 12:01 | |
*** jcoufal has joined #openstack-infra | 12:02 | |
*** nprivalova has quit IRC | 12:03 | |
*** nprivalova has joined #openstack-infra | 12:04 | |
sdague | so was there a bug against openstack ci on the jenkins crash? | 12:05 |
yassine | Hello all, | 12:05 |
yassine | i got some issues with my patch https://review.openstack.org/#/c/60499 it looks like zookeeper package was not | 12:05 |
yassine | successfully installed in some slaves despite this patch which add zookeeper in Puppet manifest https://review.openstack.org/#/c/60509 for jenkins slaves. Is it a known issue ? How could it be fixed ? :/ | 12:05 |
apevec | sdague, ttx filed bug 1260654 for one instance of NoClassDefFoundError | 12:07 |
uvirtbot | Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,New] https://launchpad.net/bugs/1260654 | 12:07 |
sdague | apevec: I actually meant the reusing of the slaves | 12:07 |
sdague | which caused all the jobs to fail | 12:07 |
sdague | because right now we are sticking it on another bug | 12:08 |
sdague | which wasn't really the story | 12:08 |
apevec | oh, I don't know about "reusing of the slaves", what was that? | 12:09 |
*** fifieldt has quit IRC | 12:09 | |
sdague | last night, it's why everything failed for a while | 12:09 |
apevec | is missing classes on slave related or not? | 12:10 |
sdague | not sure | 12:11 |
*** ruhe has joined #openstack-infra | 12:13 | |
*** rongze has joined #openstack-infra | 12:15 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/elastic-recheck: add query for jenkins crash https://review.openstack.org/61974 | 12:16 |
*** ianw has quit IRC | 12:18 | |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: add query for jenkins crash https://review.openstack.org/61974 | 12:19 |
*** SergeyLukjanov is now known as _SergeyLukjanov | 12:24 | |
*** dstanek has joined #openstack-infra | 12:24 | |
*** _SergeyLukjanov has quit IRC | 12:25 | |
*** mfer has joined #openstack-infra | 12:25 | |
*** sarob has joined #openstack-infra | 12:26 | |
*** SergeyLukjanov has joined #openstack-infra | 12:33 | |
*** Abhishek_ has quit IRC | 12:37 | |
*** thomasem has joined #openstack-infra | 12:48 | |
yassine | could someone please answer my question :$ | 12:49 |
*** jcoufal has quit IRC | 12:49 | |
openstackgerrit | Cyril Roelandt proposed a change to openstack/requirements: HTTPretty: update to 0.7.1 https://review.openstack.org/61981 | 12:49 |
*** jcoufal has joined #openstack-infra | 12:50 | |
*** HenryG has quit IRC | 12:51 | |
*** dolphm has joined #openstack-infra | 12:51 | |
*** sandywalsh has joined #openstack-infra | 12:53 | |
*** dizquierdo has quit IRC | 12:56 | |
*** HenryG has joined #openstack-infra | 12:58 | |
*** sarob has quit IRC | 12:59 | |
*** yaguang has quit IRC | 13:02 | |
*** dkliban has joined #openstack-infra | 13:02 | |
*** marun has joined #openstack-infra | 13:03 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Stories and Tasks search https://review.openstack.org/60515 | 13:05 |
*** dstanek has quit IRC | 13:06 | |
*** dstanek has joined #openstack-infra | 13:10 | |
*** oubiwan__ has joined #openstack-infra | 13:15 | |
*** sandywalsh has quit IRC | 13:15 | |
portante | jog0, clarkb, sdague, under what category to I file this bug: | 13:17 |
portante | http://logs.openstack.org/87/61587/1/gate/gate-swift-pep8/3291350/console.html | 13:17 |
portante | 13:17 | |
sdague | yeh, we definitely need someone in .eu to get skilled up on infra. Maybe we tell mordred he has to move to barcelona :) | 13:21 |
ruhe | already answered in qa channel. for everyone else jenkins error is filed in https://bugs.launchpad.net/bugs/1260654 | 13:21 |
uvirtbot | Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,Confirmed] | 13:21 |
sdague | ruhe: thanks | 13:21 |
*** oubiwan__ has quit IRC | 13:22 | |
portante | sdague: can we get an elastic recheck for these kinds of infra bugs? they are likely to happen again in the future at some point | 13:22 |
*** dizquierdo has joined #openstack-infra | 13:23 | |
ruhe | sdague, we (mirantis) do plan to dedicate couple of engineers to work on infra full-time, but it sure will take a lot of time to get accustomed to infra | 13:23 |
sdague | portante: there is one, but right now e-r only looks for details in tempest/devstack jobs | 13:24 |
dims | portante, its easy to submit a review against elastic-recheck, just need to add a yaml file - https://github.com/openstack-infra/elastic-recheck/tree/master/queries :) | 13:24 |
sdague | it's a future enhancement to have it look at all the jobs | 13:24 |
*** dcramer_ has joined #openstack-infra | 13:25 | |
sdague | see scroll back 16 lines where I added it to e-r | 13:25 |
*** derekh has quit IRC | 13:26 | |
*** sarob has joined #openstack-infra | 13:26 | |
*** esker has quit IRC | 13:27 | |
dims | sdague, ah cool. i just started looking at gate queue and was expected 50+ and found just a few and was wondering when i saw this :) | 13:27 |
sdague | dims: yeh, so when jog0 started the assumption was the only things that actually failed in the gate were races caused by a real cloud | 13:27 |
sdague | i.e. there is no reason for docs and unit tests to fail in the gate, they should have passed in check | 13:28 |
sdague | but external events can make them fail (as well as bad reviewers) | 13:28 |
sdague | so they need to be added | 13:28 |
dims | sdague, right | 13:28 |
sdague | and that's part of the code which needs some more brutal refactoring to get there | 13:29 |
portante | is we can tell these events, can we not make users do a recheck and just have infra retry the job? | 13:29 |
sdague | portante: so the issue is we are skipping processing them on the elastic recheck side | 13:29 |
sdague | because processing a job type requires actually knowing all the files that might need to have gotten to elastic search, as there is a delay | 13:30 |
portante | yes, thanks, understood | 13:30 |
sdague | portante: and, in general, we don't want to do auto recheck, because experience has shown that no one actually looks at the issues | 13:31 |
*** dcramer_ has quit IRC | 13:31 | |
sdague | the point of e-r is to help us classify the "worst" races we are seeing and grouping them, so people can prioritize these | 13:32 |
sdague | and get them fixed | 13:32 |
portante | developer frustration with rechecks is still growing though, and we need to address that too. | 13:33 |
sdague | portante: sure, and the way to fix that is to fix the underlying issues | 13:33 |
sdague | because if we just autorechecked, all it would mean is the gate merge time would grow to over a day as everything crashes through, blows up, is automatically readded. | 13:34 |
portante | sdague: certainly. though I am thinking that if a job failes because of infra issue, and it can be moved to another instance and retried, that seems like a worth-while investment | 13:34 |
portante | sdague, I am not suggesting recheck the entire job | 13:35 |
portante | just have the ci re-run the docs job on another instance when it detects that there is an infrastructure issue | 13:35 |
sdague | portante: sure, there could be infra recovery for exactly this kind of issue. I'd like the ci team address that | 13:35 |
portante | where do they live? #openstack-ci | 13:35 |
sdague | here | 13:35 |
sdague | ci/infra | 13:36 |
sdague | but the core team is basically west coast US, plus fungi on the east US, so they aren't awake yet | 13:36 |
*** oubiwan__ has joined #openstack-infra | 13:37 | |
*** paul-- has quit IRC | 13:38 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Stories and Tasks search https://review.openstack.org/60515 | 13:40 |
portante | sdague: thanks | 13:40 |
*** jcoufal has quit IRC | 13:41 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Added basic popup messages https://review.openstack.org/59706 | 13:41 |
*** dhellmann_ is now known as dhellmann | 13:42 | |
*** dolphm has quit IRC | 13:43 | |
*** weshay has joined #openstack-infra | 13:48 | |
*** dkliban has quit IRC | 13:50 | |
*** dolphm has joined #openstack-infra | 13:51 | |
*** dolphm_ has joined #openstack-infra | 13:52 | |
*** dprince has joined #openstack-infra | 13:55 | |
*** dolphm has quit IRC | 13:56 | |
*** bpokorny has joined #openstack-infra | 13:57 | |
*** sarob has quit IRC | 13:58 | |
*** rongze has quit IRC | 13:59 | |
*** oubiwan__ has quit IRC | 14:02 | |
openstackgerrit | Cyril Roelandt proposed a change to openstack/requirements: HTTPretty: update to 0.7.1 https://review.openstack.org/61981 | 14:07 |
*** jpich has quit IRC | 14:07 | |
*** mriedem has joined #openstack-infra | 14:08 | |
sdague | ttx: I have a new hack idea, if you want to try it with your email thing | 14:09 |
sdague | any time a bug gets too big to modify via the web, add launchpad as an affected project | 14:09 |
sdague | with a comment that launchpad is getting added because we can no longer modify this bug in launchpad | 14:10 |
ttx | my email thing is not magic, just applying https://help.launchpad.net/Bugs/EmailInterface | 14:10 |
sdague | I'm actually super annoyed that I've got 2 bugs in the tempest queue that are dead wood | 14:10 |
ttx | (just need your PGP publickey registered with LP) | 14:10 |
sdague | #1179008 rename requires files to standard names | 14:10 |
ttx | sdague: maybe the other one is not as blocked | 14:10 |
sdague | #1214176 Fix copyright headers to be compliant with Foundation policies | 14:11 |
ttx | let me try that second one | 14:11 |
sdague | could you get the LP team to just delete those bugs entirely | 14:11 |
*** ruhe is now known as ruhe_ | 14:12 | |
ttx | bah, submit request failure | 14:12 |
ttx | sdague: they usually reply to launchpad questions. Let me try that | 14:13 |
sdague | I think we should just delete any bug that's gotten out of control, because it just causes problems with projects that show up late and try to fix it | 14:14 |
*** oubiwan__ has joined #openstack-infra | 14:15 | |
*** vkozhukalov has quit IRC | 14:15 | |
fungi | what's the urgent machine to remove? | 14:16 |
sdague | fungi: one sec | 14:16 |
*** jcoufal has joined #openstack-infra | 14:17 | |
*** blamar has quit IRC | 14:17 | |
*** ruhe_ has quit IRC | 14:18 | |
sdague | fungi: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiamF2YS5pby5JT0V4Y2VwdGlvblwiICAgQU5EIG1lc3NhZ2U6XCJSZW1vdGUgY2FsbCBvblwiICAgQU5EIG1lc3NhZ2U6XCJmYWlsZWRcIiAgIEFORCBmaWxlbmFtZTpcImNvbnNvbGUuaHRtbFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiJhbGwiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzg2OTQ0Mjg0Nzg3fQ== | 14:18 |
sdague | precise 14 and precise 20 it seems | 14:18 |
ttx | sdague: let's see how that goes: https://answers.launchpad.net/launchpad/+question/240748 | 14:19 |
fungi | sdague: i'll dampen them | 14:19 |
*** SushilKM has quit IRC | 14:20 | |
fungi | it's possible something went weird with the slave agent connection to them when we rebooted jenkins02 | 14:20 |
*** eharney has joined #openstack-infra | 14:22 | |
ttx | fungi: let us know, we'll restart the stable/* jobs afterwards | 14:24 |
fungi | they're already offline as of a minute or so | 14:25 |
ttx | fungi: ok, retrying then. | 14:25 |
fungi | https://jenkins02.openstack.org/computer/precise14/ and https://jenkins02.openstack.org/computer/precise20/ | 14:25 |
*** russellb is now known as rustlebee | 14:25 | |
fungi | i'll work some magic to get them back into service | 14:25 |
ttx | fungi: what is the appropriate keyword to reverify in that case ? | 14:26 |
*** sarob has joined #openstack-infra | 14:26 | |
ttx | I can abuse bug 1260654 | 14:26 |
uvirtbot | Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Undecided,Confirmed] https://launchpad.net/bugs/1260654 | 14:26 |
*** flaper87|afk is now known as flaper87 | 14:26 | |
fungi | ttx: you can just reapprove them instead of using reverify if you're core for that (which you are), or if we have a bug open on this already then you could reverify against that bug | 14:26 |
fungi | that works | 14:27 |
apevec | ttx, why would that be abuse? | 14:27 |
ttx | apevec: it may or may not match exactly that error :) | 14:27 |
fungi | for the record, slave agent failures look likely... https://jenkins02.openstack.org/job/gate-nova-python27/13301/consoleText https://jenkins02.openstack.org/job/gate-neutron-docs/3708/consoleText | 14:27 |
*** dansmith is now known as damnsmith | 14:27 | |
*** oubiwan__ has quit IRC | 14:28 | |
apevec | yeah, some had multiple failures, I've sent 2013.2.1 update email to you specifying what failed where | 14:28 |
fungi | precise14 seemed to be dying straight away, but precise20 was getting through the job and then bailing on artifact collection | 14:28 |
ttx | apevec: horizon/heat requirements sync reverified | 14:28 |
apevec | tahnks | 14:28 |
*** yamahata_ has joined #openstack-infra | 14:29 | |
*** prad has joined #openstack-infra | 14:32 | |
*** ilyashakhat has joined #openstack-infra | 14:35 | |
*** ilyashakhat has quit IRC | 14:36 | |
*** ruhe has joined #openstack-infra | 14:36 | |
*** bknudson has joined #openstack-infra | 14:36 | |
*** sarob has quit IRC | 14:37 | |
*** andreaf has quit IRC | 14:38 | |
*** saper_ is now known as saper | 14:38 | |
fungi | precise14 and 20 rebooted and back in service, watching to make sure jobs complete on them now | 14:39 |
fungi | this ran to completion on precise14... https://jenkins02.openstack.org/job/gate-puppet-neutron-puppet-syntax/83/console | 14:39 |
fungi | and this on precise20... https://jenkins02.openstack.org/job/gate-puppet-neutron-puppet-unit-2.7/121/console | 14:40 |
fungi | should be sane now | 14:41 |
*** Abhishe__ has joined #openstack-infra | 14:41 | |
apevec | fungi, so what was it? | 14:41 |
*** smarcet has joined #openstack-infra | 14:41 | |
apevec | dolphm_, please approve https://review.openstack.org/61425 | 14:41 |
fungi | java exceptions when the master was trying to communicate with the slave agent. there's every chance they lost their sanity during the reboot of jenkins02 last night | 14:42 |
*** dstanek has quit IRC | 14:42 | |
fungi | well, s/reboot/restart/ | 14:42 |
openstackgerrit | David Kranz proposed a change to openstack-infra/devstack-gate: Always dump errors to console https://review.openstack.org/61850 | 14:43 |
*** xchu has joined #openstack-infra | 14:43 | |
fungi | jenkins01 shot itself in the head last night over a jvm oom condition so we had to scarmble to get everything back up and running after that, and noticed jenkins02 was using at least as much memory as 01 had been so we did a controlled restart of jenkins on it as well | 14:43 |
*** xchu has quit IRC | 14:43 | |
*** dstanek has joined #openstack-infra | 14:43 | |
sdague | fungi: yeh, seems like it would be nice to have something that auto downs these nodes on a jenkins stack trace capture | 14:44 |
fungi | but in the process, i'm betting something happened with slave agent communication to precise14 and 20 as it booted back up | 14:44 |
*** rongze has joined #openstack-infra | 14:44 | |
sdague | this seems to happen every 3 weeks or so, and basically kills a whole dev cycle for .eu | 14:44 |
fungi | sdague: i wonder whether there's a jenkins plugin for that | 14:44 |
*** xchu has joined #openstack-infra | 14:44 | |
*** rnirmal has joined #openstack-infra | 14:44 | |
*** HenryG_ has joined #openstack-infra | 14:45 | |
ruhe | fungi, sdague: i guess monitoring system might be enough to prevent such events | 14:45 |
fungi | sdague: but regardless, we're already in progress shifting jobs to single-use slaves, which is our preferred near-term solution to this (as opposed to the longer-term "get rid of jenkins entirely" solution) | 14:45 |
*** jd__ has quit IRC | 14:46 | |
*** iv_m has quit IRC | 14:46 | |
*** jd__ has joined #openstack-infra | 14:46 | |
*** iv_m has joined #openstack-infra | 14:46 | |
*** hughsaunders has quit IRC | 14:46 | |
fungi | ruhe: interestingly, probably not. there's was nothing outwardly unusual about the condition of those slaves. we'd need to interrogate the jenkins master and have it perform some sort of communication and artifact collection tests as a canary | 14:46 |
*** hughsaunders has joined #openstack-infra | 14:46 | |
*** xchu has quit IRC | 14:46 | |
fungi | nontrivial | 14:46 |
fungi | probably special jobs which would need to be run between normal jobs to detect a condition like that | 14:47 |
*** xchu has joined #openstack-infra | 14:47 | |
*** xchu has quit IRC | 14:47 | |
*** blamar has joined #openstack-infra | 14:48 | |
fungi | sdague: at the moment, there are already a handful of infra jobs we've shifted from long-running slaves to bare (non-devstack) single-use slaves, with great success. it's just a matter of slowly shifting the remainder | 14:48 |
*** xchu has joined #openstack-infra | 14:48 | |
* fungi will brb | 14:48 | |
*** HenryG has quit IRC | 14:49 | |
*** dkliban has joined #openstack-infra | 14:50 | |
*** andreaf has joined #openstack-infra | 14:58 | |
*** jcooley_ has joined #openstack-infra | 15:00 | |
*** markmcclain has joined #openstack-infra | 15:02 | |
*** esker has joined #openstack-infra | 15:05 | |
dolphm_ | apevec: done | 15:06 |
apevec | thanks! | 15:06 |
*** rcleere has joined #openstack-infra | 15:09 | |
*** pabelanger has joined #openstack-infra | 15:09 | |
*** rongze has quit IRC | 15:11 | |
*** jasond has joined #openstack-infra | 15:12 | |
*** dcramer_ has joined #openstack-infra | 15:14 | |
*** basha has joined #openstack-infra | 15:16 | |
*** ryanpetrello has joined #openstack-infra | 15:18 | |
*** markmcclain has quit IRC | 15:19 | |
*** apevec has quit IRC | 15:21 | |
*** alcabrera has joined #openstack-infra | 15:23 | |
*** datsun180b has joined #openstack-infra | 15:26 | |
*** sarob has joined #openstack-infra | 15:26 | |
*** oubiwan__ has joined #openstack-infra | 15:29 | |
*** markmcclain has joined #openstack-infra | 15:30 | |
*** dolphm_ has quit IRC | 15:30 | |
*** jcoufal has quit IRC | 15:30 | |
*** zehicle_at_dell has joined #openstack-infra | 15:31 | |
*** rwsu has joined #openstack-infra | 15:31 | |
*** rcarrillocruz has joined #openstack-infra | 15:33 | |
mriedem | just opened this against infra, not sure if it's a known issue yet or not: | 15:33 |
mriedem | https://bugs.launchpad.net/openstack-ci/+bug/1260767 | 15:33 |
uvirtbot | Launchpad bug 1260767 in openstack-ci "gate-nova-docs fails on master with "Remote call on precise14 failed"" [Undecided,New] | 15:33 |
*** rcarrillocruz1 has quit IRC | 15:35 | |
*** xchu has quit IRC | 15:36 | |
portante | mreidem: saw that earlier | 15:37 |
*** jcooley_ has quit IRC | 15:37 | |
*** SushilKM has joined #openstack-infra | 15:37 | |
portante | I think 1260654 | 15:38 |
*** jcooley_ has joined #openstack-infra | 15:38 | |
portante | https://bugs.launchpad.net/openstack-ci/+bug/1260654 | 15:38 |
*** dizquierdo has quit IRC | 15:38 | |
uvirtbot | Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Critical,Fix released] | 15:38 |
portante | sdague filed that, I think | 15:38 |
*** rongze has joined #openstack-infra | 15:39 | |
sdague | portante actually ttx | 15:42 |
sdague | but you are right, it's a dup | 15:42 |
*** iv_m has quit IRC | 15:43 | |
* ttx admits only having checked the rechecks page | 15:43 | |
*** jcooley_ has quit IRC | 15:43 | |
*** marun has quit IRC | 15:43 | |
mriedem | thanks guys | 15:43 |
*** bnemec is now known as beekneemech | 15:45 | |
*** dims has quit IRC | 15:45 | |
*** dims has joined #openstack-infra | 15:47 | |
*** zehicle_at_dell has quit IRC | 15:51 | |
fungi | yassine: i've checked out unit-test slaves and it doesn't look like the centos6 slaves have a zookeeper-server installed. in fact, it doesn't appear that centos 6.4 provides an rpm for any package named zookeeper-server in its standard yum package repositories | 15:52 |
*** NikitaKonovalov has quit IRC | 15:53 | |
fungi | yassine: the corresponding "zookeeper" package is installed on our ubuntu precise slaves however (both our python 2.7 and python 3.3 slave variants) | 15:53 |
*** mriedem has quit IRC | 15:56 | |
*** maurosr has quit IRC | 15:56 | |
fungi | yassine: http://paste.openstack.org/show/54962/ | 15:57 |
*** mriedem has joined #openstack-infra | 15:57 | |
*** sarob has quit IRC | 15:58 | |
*** mfer has quit IRC | 15:58 | |
*** maurosr has joined #openstack-infra | 15:59 | |
*** rossella_s has quit IRC | 16:00 | |
*** mdenny has joined #openstack-infra | 16:01 | |
*** rnirmal_ has joined #openstack-infra | 16:02 | |
*** rnirmal has quit IRC | 16:02 | |
*** rnirmal_ is now known as rnirmal | 16:02 | |
*** mfer has joined #openstack-infra | 16:03 | |
sdague | fungi: so https://bugs.launchpad.net/tempest/+bug/1260710 | 16:03 |
openstackgerrit | A change was merged to openstack-infra/release-tools: Add mpcut.sh for milestone-proposed branch cutting https://review.openstack.org/61389 | 16:03 |
uvirtbot | Launchpad bug 1260710 in tempest "testr lists both tests and unit tests in gate-tempest-python27 job" [High,In progress] | 16:03 |
sdague | it marked it as in progress, but didn't post the review | 16:03 |
*** talluri has quit IRC | 16:04 | |
sdague | which I find highly annoying | 16:04 |
sdague | and that seems to be the norm now | 16:04 |
*** talluri has joined #openstack-infra | 16:04 | |
sdague | is that intended, or a bug? | 16:04 |
jasond | is "reverify no bug" okay to use? if not, how would i go about identifying the bug to reverify? | 16:05 |
fungi | sdague: what's the corresponding review for that bug? it doesn't seem to be the one mentioned in the bug description | 16:06 |
sdague | fungi: https://review.openstack.org/#/c/62019/ | 16:06 |
fungi | sdague: https://review.openstack.org/#/c/62019/1..2//COMMIT_MSG | 16:07 |
fungi | that's why | 16:07 |
sdague | oh, right mtreinish failed on commit message | 16:07 |
mtreinish | sdague: did I have a period? | 16:08 |
sdague | mtreinish: Fixes bug doesn't link | 16:08 |
fungi | update_bug.py is not smart enough to know whether or not its posted the review link, so it errs on the side of not spamming people on every patchset with another bug comment and just does it if it's patchset #1 | 16:08 |
sdague | Closes-Bug: #............... | 16:08 |
fungi | the diff there shows that he added the bug header on comment #2, which is one reason | 16:09 |
fungi | er, on patchset #2 | 16:09 |
mtreinish | sdague: sigh... ok I'll respin it | 16:09 |
*** basha has quit IRC | 16:09 | |
*** jcooley_ has joined #openstack-infra | 16:09 | |
mtreinish | fungi: will that be enough? | 16:09 |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering https://review.openstack.org/54403 | 16:09 |
jeblair | i'm in favor of tightening the gerrit regex so it matches in the webui | 16:09 |
openstackgerrit | Sahid Orentino Ferdjaoui proposed a change to openstack/requirements: Tox fails to build environment because of MySQL-Python version https://review.openstack.org/62027 | 16:10 |
sdague | yeh, it would be more obvious if the behavior was the same on both | 16:10 |
fungi | using a standardized bug header will also help (so that it will also close the bug) but the reason it set in-progress and didn't comment with the link in the bug is that you didn't have a bug header on the initial patchset | 16:10 |
sdague | fungi: sure | 16:11 |
fungi | jeblair: i think we tightened the regex in gerrit as much as we could without losing links on review comments like "recheck bug 12345" | 16:11 |
uvirtbot | Launchpad bug 12345 in isdnutils "isdn does not work, fritz avm (pnp?)" [Medium,Fix released] https://launchpad.net/bugs/12345 | 16:11 |
sdague | but if it gets fixed now will it post? | 16:11 |
*** jasond has left #openstack-infra | 16:11 | |
*** Ryan_Lane has joined #openstack-infra | 16:12 | |
openstackgerrit | Sahid Orentino Ferdjaoui proposed a change to openstack/requirements: Tox fails to build environment because of MySQL-Python version https://review.openstack.org/62028 | 16:12 |
*** talluri has quit IRC | 16:13 | |
fungi | sdague: i can't remember if update_bug.py will set it to fix committed/released on a string like "Fixes bug 1260710" though i'm pretty sure "Fixes-bug: 1260710" works (even though closes is the recommended term in the wiki) | 16:13 |
uvirtbot | Launchpad bug 1260710 in tempest "testr lists both tests and unit tests in gate-tempest-python27 job" [High,In progress] https://launchpad.net/bugs/1260710 | 16:13 |
fungi | the goal being to drive contributors toward using standard git header formats for these so they can be more easily mined from commit logs in the future | 16:14 |
*** rongze has quit IRC | 16:14 | |
fungi | oh, also it should have been in the final paragraph of the commit message to be a proper header | 16:14 |
fungi | that extra blank line makes it not | 16:14 |
fungi | mtreinish: ^ | 16:15 |
*** jasond has joined #openstack-infra | 16:15 | |
jasond | does anybody know why this review says "Need Verified"? https://review.openstack.org/#/c/59851/ | 16:16 |
mtreinish | fungi: seriously | 16:17 |
mtreinish | do I need to do another revision? | 16:17 |
fungi | mtreinish: nope--just pointing out if you're trying to correct the commit message, that's part of it | 16:17 |
fungi | jasond: taken care of | 16:18 |
*** AaronGr_Zzz is now known as AaronGr | 16:18 | |
jasond | fungi: thanks! | 16:18 |
*** ilyashakhat_ has quit IRC | 16:20 | |
*** rongze has joined #openstack-infra | 16:21 | |
*** jcooley_ has quit IRC | 16:21 | |
*** jcooley_ has joined #openstack-infra | 16:21 | |
*** zehicle_at_dell has joined #openstack-infra | 16:22 | |
*** AaronGr is now known as AaronGr_afk | 16:22 | |
*** jcooley_ has quit IRC | 16:25 | |
*** sarob has joined #openstack-infra | 16:26 | |
*** hashar has quit IRC | 16:27 | |
*** sarob has quit IRC | 16:31 | |
*** zehicle has joined #openstack-infra | 16:32 | |
yassine | fungi: thank you for the information !! Do you know how could i fix this issue ? :/ | 16:33 |
*** zehicle_at_dell has quit IRC | 16:33 | |
*** johnthetubaguy has quit IRC | 16:33 | |
*** saschpe_ has quit IRC | 16:33 | |
*** johnthetubaguy1 has joined #openstack-infra | 16:33 | |
*** niska has quit IRC | 16:34 | |
*** mrodden1 has quit IRC | 16:34 | |
*** saschpe has joined #openstack-infra | 16:34 | |
fungi | yassine: i left a review comment on the change you linked, but in short unless you can get a zookeeper-server rpm into centos 6 main repositories or fedora epel such that we can yum install it on the test slaves, your other option for python 2.6 unit testing right now would be figuring out whether it can be installed and used locally in the jenkins user's home directory by your unit test job without | 16:35 |
fungi | needing root permissions on the system | 16:35 |
fungi | i'm not familiar enough with what zookeeper is or how it works to know whether that's possible | 16:35 |
*** ^d has joined #openstack-infra | 16:35 | |
*** dkliban has quit IRC | 16:37 | |
*** hughsaunders has quit IRC | 16:37 | |
*** prad has quit IRC | 16:37 | |
*** yamahata_ has quit IRC | 16:37 | |
*** changbl has quit IRC | 16:37 | |
*** openstackgerrit has quit IRC | 16:37 | |
*** Ghe_HPDiscover has quit IRC | 16:37 | |
*** juice has quit IRC | 16:37 | |
*** tian has quit IRC | 16:37 | |
*** iccha has quit IRC | 16:37 | |
*** Alex_Gaynor has quit IRC | 16:37 | |
*** jasond has quit IRC | 16:37 | |
*** hughsaunders_ has joined #openstack-infra | 16:37 | |
*** changbl_ has joined #openstack-infra | 16:37 | |
*** hughsaunders_ is now known as hughsaunders | 16:38 | |
*** nicedice has joined #openstack-infra | 16:38 | |
*** tian has joined #openstack-infra | 16:38 | |
*** jasond has joined #openstack-infra | 16:38 | |
*** dkliban has joined #openstack-infra | 16:38 | |
*** prad has joined #openstack-infra | 16:38 | |
*** yamahata_ has joined #openstack-infra | 16:38 | |
*** openstackgerrit has joined #openstack-infra | 16:38 | |
*** Ghe_HPDiscover has joined #openstack-infra | 16:38 | |
*** juice has joined #openstack-infra | 16:38 | |
*** iccha has joined #openstack-infra | 16:38 | |
*** Alex_Gaynor has joined #openstack-infra | 16:38 | |
*** niska has joined #openstack-infra | 16:38 | |
*** SushilKM has quit IRC | 16:40 | |
*** prad_ has joined #openstack-infra | 16:42 | |
*** StevenK_ has joined #openstack-infra | 16:42 | |
*** johnthetubaguy1 has quit IRC | 16:43 | |
*** SergeyLukjanov has quit IRC | 16:43 | |
*** lcestari has quit IRC | 16:44 | |
*** iccha_ has joined #openstack-infra | 16:44 | |
*** lcestari has joined #openstack-infra | 16:44 | |
*** StevenK has quit IRC | 16:44 | |
*** guitarzan has quit IRC | 16:44 | |
yassine | fungi: oh thanks! If i can wget the zookeeper tar then i can run the zookeeper server without root permissions, is it possible to wget from the slave ? | 16:44 |
*** Ghe_HPDi1cover has joined #openstack-infra | 16:45 | |
*** jasond` has joined #openstack-infra | 16:45 | |
dhellmann | mordred: responding to your query from monday, no I don't see a 1.2 release of oslo.messaging. Did you already talk to markmc about it? | 16:45 |
dhellmann | clarkb, sdague: responding to your comment from monday about overloading the branch-designator for wsme/pecan gate jobs, I'm not sure what that means. :-( | 16:46 |
*** zehicle has quit IRC | 16:46 | |
ruhe | is puppet-dashboard.openstack.org supposed to render some html on port 3000? | 16:47 |
*** johnthetubaguy has joined #openstack-infra | 16:47 | |
* dhellmann is happy for irc client history, but needs a better tool for dealing with irc while traveling | 16:47 | |
fungi | yassine: yes, that's fine, just be aware that downloads from the internet sometimes fail, especially if it's a large file, so it could cause your job to occasionally return a false negative result | 16:47 |
*** pcrews has joined #openstack-infra | 16:47 | |
*** johnthetubaguy has quit IRC | 16:48 | |
*** guitarzan has joined #openstack-infra | 16:48 | |
*** johnthetubaguy has joined #openstack-infra | 16:48 | |
*** juice- has joined #openstack-infra | 16:48 | |
fungi | ruhe: it would if it were working, but it broke. there's a project underway to replace it with something called puppetboard (anteaya, Hunner and pleia2 are collaborating on it last i heard) | 16:48 |
ruhe | fungi: got it, thanks | 16:49 |
*** tian has quit IRC | 16:49 | |
*** dkliban has quit IRC | 16:49 | |
*** prad has quit IRC | 16:49 | |
*** yamahata_ has quit IRC | 16:49 | |
*** openstackgerrit has quit IRC | 16:49 | |
*** Ghe_HPDiscover has quit IRC | 16:49 | |
*** juice has quit IRC | 16:49 | |
*** iccha has quit IRC | 16:49 | |
*** Alex_Gaynor has quit IRC | 16:49 | |
*** jasond has quit IRC | 16:49 | |
*** juice- is now known as juice | 16:49 | |
*** prad_ is now known as prad | 16:49 | |
fungi | ruhe: puppet dashboard is unfortunately somewhat fragile, and was further complicated by boundlessly growing its mysql db until we couldn't effectively clean or resize it, so we eventually stopped trying while the replacement project is underway | 16:50 |
fungi | (there are simply too few of us to limp too many broken systems along indefinitely) | 16:51 |
*** esker has quit IRC | 16:51 | |
*** SushilKM has joined #openstack-infra | 16:51 | |
ruhe | let's hope puppetboard doesn't have this issues. i'll try to install it in my infra copy and see how it goes | 16:52 |
fungi | ruhe: it sounds like it will work out much better. lighter weight and actually supported (puppet dashboard was effectively dead upstream, we were running somewhat of a fork, since puppetlabs had moved to recommending their proprietary dashboard instead) | 16:53 |
fungi | ruhe: however it needs puppetdb, which we hadn't previously been using, so i think they're working on getting a manifest together to install that along with puppetboard | 16:53 |
*** tian has joined #openstack-infra | 16:54 | |
*** yamahata_ has joined #openstack-infra | 16:54 | |
*** Alex_Gaynor has joined #openstack-infra | 16:54 | |
fungi | the alternative we'd explored was switching to the sodabrew fork of puppet dashboard, since its upstream was also somewhat active still | 16:55 |
yassine | fungi: perfect ! i will wget then, it will simplify my script :) thank you for your help i really appreciate | 16:55 |
*** jcooley_ has joined #openstack-infra | 16:55 | |
*** danger_fo_away is now known as dangers | 16:55 | |
fungi | yassine: my pleasure--let me know if you have any other questions | 16:55 |
yassine | sure :) | 16:55 |
* fungi needs to disappear again for a moment, and will return shortly | 16:55 | |
sdague | do we have a bug bot anywhere? | 16:56 |
*** dkliban has joined #openstack-infra | 16:56 | |
sdague | I'd really like to get IRC message on new bugs | 16:56 |
sdague | for tempest, so we can basically keep new bugs down to 0 | 16:56 |
*** mrodden has joined #openstack-infra | 16:57 | |
clarkb | sdague: soren has one. it subscribes to bugs and alerts on imap entries | 16:58 |
clarkb | dhellman: basically in that designator you put a string saying this is a wsme/pecan job | 16:59 |
dhellmann | clarkb: does that go in the job definition in one of the yaml files? | 16:59 |
clarkb | everything else about the job matches the openstack gate so you stay in sync without mutual gating | 16:59 |
markmc | dhellmann, I didn't do a 1.2 release of oslo.messaging | 16:59 |
clarkb | dhellman: in projects.yaml where you instantiate the template | 16:59 |
dhellmann | markmc: I don't see any releases on pypi, should we do one? | 17:00 |
markmc | dhellmann, no, it wasn't in havana - first release will be in icehouse, and that will be 1.3 | 17:00 |
markmc | dhellmann, was going to do 1.2 when it looked like it was going to be in havana | 17:01 |
markmc | dhellmann, IOW, there's still room for some API changes | 17:01 |
dhellmann | markmc: why 1.3 if there is not yet a 1.2? I feel like we've had this conversation... | 17:01 |
dhellmann | ah | 17:01 |
markmc | dhellmann, would like them to be minor at this point yet | 17:01 |
markmc | dhellmann, matching oslo.config, for no great reason | 17:01 |
*** markmcclain has quit IRC | 17:01 | |
dhellmann | markmc: ok, I didn't think we were worried about matching release versions across libraries like that, but we can talk about it | 17:02 |
dhellmann | clarkb: what does the branch-designator buy us? a separate gate queue? so pecan gate jobs don't clog up the openstack gate? | 17:02 |
clarkb | dhellmann: correct that plus staying in sync with the openstack gate | 17:03 |
Hunner | Guh. Still haven't done any puppetboard stuff... It's hard to do stuff so close to work, I think >_< | 17:03 |
*** dkliban has quit IRC | 17:04 | |
clarkb | rather than two different templates that can diverge there is one template that can create jobs with arbitrary names | 17:04 |
*** tma996 has joined #openstack-infra | 17:04 | |
dhellmann | clarkb, so I would add a "devstack-jobs" entry to the jobs list for pecan with pipeline=gate and branch-designator=pecan-wsme or something like that? | 17:06 |
dhellmann | clarkb: or maybe that pipeline should be different, too? | 17:06 |
*** jooools has quit IRC | 17:06 | |
*** UtahDave has joined #openstack-infra | 17:07 | |
*** SushilKM has quit IRC | 17:08 | |
*** talluri has joined #openstack-infra | 17:08 | |
*** nprivalova_ has joined #openstack-infra | 17:11 | |
clarkb | dhellman: no thst sounds fine. you may not want all of devstack-jobs though | 17:12 |
*** dstanek_afk has joined #openstack-infra | 17:12 | |
*** dstanek has quit IRC | 17:13 | |
*** nprivalova has quit IRC | 17:13 | |
*** nprivalova_ is now known as nprivalova | 17:13 | |
*** dstanek_afk is now known as dstanek | 17:13 | |
dhellmann | clarkb: yeah, we'll look at the list and verify before including all of them | 17:16 |
dhellmann | clarkb: thanks for the tips | 17:17 |
*** ruhe has quit IRC | 17:19 | |
jeblair | #status log restarted gerritbot | 17:21 |
*** openstackgerrit has joined #openstack-infra | 17:22 | |
*** SergeyLukjanov has joined #openstack-infra | 17:23 | |
*** zehicle_at_dell has joined #openstack-infra | 17:23 | |
*** Alex_Gaynor has quit IRC | 17:26 | |
*** Alex_Gaynor has joined #openstack-infra | 17:26 | |
mriedem | are jobs timing out at all right now? | 17:26 |
mriedem | http://logs.openstack.org/52/55752/14/check/check-tempest-dsvm-full/3eb1378/console.html | 17:26 |
mriedem | http://logs.openstack.org/52/55752/14/check/check-tempest-dsvm-full/3eb1378/console.html#_2013-12-13_16_53_45_134 | 17:27 |
*** SushilKM has joined #openstack-infra | 17:28 | |
jeblair | mriedem: http://status.openstack.org/zuul/ sasy many check jobs have succeeded recently | 17:30 |
mriedem | so hiccup? | 17:30 |
jeblair | mriedem: no, i believe there is a current nondeterministic bug that causes jobs to run very long and time out | 17:31 |
mriedem | jeblair: i opened https://bugs.launchpad.net/openstack-ci/+bug/1260816 to recheck against | 17:31 |
uvirtbot | Launchpad bug 1260816 in openstack-ci "check-tempest-dsvm-full job timed out causing build failure" [Undecided,New] | 17:31 |
*** yaguang has joined #openstack-infra | 17:31 | |
*** dolphm has joined #openstack-infra | 17:32 | |
jeblair | mriedem: https://bugs.launchpad.net/tempest/+bug/1258682 | 17:34 |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] | 17:34 |
jeblair | mriedem: i will mark your bug as a dup of that | 17:34 |
mriedem | jeblair: ah, thanks, maybe i didn't find it 'Build timed out' in LP, at least not in openstack-ci where i was looking | 17:34 |
jeblair | mriedem: i also tagged it with 'gate-failure' which we've recently started doing to try to make these easier to find | 17:35 |
jeblair | mriedem: [i understand, you see how long it took me to find it :( ] | 17:35 |
*** freyes has joined #openstack-infra | 17:36 | |
fungi | jeblair: so i suppose gerritbot didn't return after one of the more recent netsplits? (i saw it in and out a few times on earlier splits today already) | 17:37 |
*** markmcclain has joined #openstack-infra | 17:38 | |
*** freyes has quit IRC | 17:41 | |
*** johnthetubaguy has quit IRC | 17:43 | |
*** SushilKM has quit IRC | 17:43 | |
*** reed has joined #openstack-infra | 17:43 | |
*** johnthetubaguy has joined #openstack-infra | 17:43 | |
*** reed has quit IRC | 17:43 | |
*** reed has joined #openstack-infra | 17:43 | |
jeblair | fungi: possibly; it seemed to be running | 17:45 |
*** basha has joined #openstack-infra | 17:45 | |
*** sandywalsh has joined #openstack-infra | 17:45 | |
*** ruhe has joined #openstack-infra | 17:46 | |
notmyname | jog0: I put my gate status code and url-generating script online https://github.com/notmyname/gate_status | 17:48 |
*** SergeyLukjanov_ has joined #openstack-infra | 17:48 | |
*** AaronGr_afk is now known as AaronGr | 17:48 | |
*** SergeyLukjanov has quit IRC | 17:48 | |
*** dolphm has quit IRC | 17:49 | |
*** dkliban has joined #openstack-infra | 17:50 | |
*** tma996 has quit IRC | 17:51 | |
jeblair | notmyname: fyi there's a jquery plugin to build graphite urls; see it in action at the bottom of view-source:http://status.openstack.org/zuul/index.html | 17:52 |
clarkb | fungi: I am going to upgrade jenkins on jenkins-dev to 1.543 now | 17:53 |
jeblair | jog0's graph uses it too | 17:53 |
notmyname | jeblair: cool. (but that would mean javascript and then I'd have to add "front end design" to my linkedin page and then I'd get more recruiter spam and ...) | 17:53 |
jeblair | notmyname: definitely not worth it :) | 17:53 |
notmyname | hehe | 17:53 |
*** dolphm has joined #openstack-infra | 17:53 | |
fungi | clarkb: awesome | 17:54 |
*** sdake_ has joined #openstack-infra | 17:54 | |
notmyname | jeblair: 12 hour buckets, over the last 11 days (that's how long you keep data?) http://not.mn/gate_status.html | 17:54 |
jeblair | clarkb: do you have a script to submit a simulated job completion event to log-gearman-worker? | 17:54 |
clarkb | jeblair: I don't, the worker doesn't receive job completion events | 17:55 |
jeblair | notmyname: it's been 11 days since we renamed the jobs (and when we renamed them, we did not move the graphite data) | 17:55 |
zaro | good morning | 17:55 |
notmyname | jeblair: ah, gotcha | 17:56 |
yaguang | help needed, change to requirements stable/grizzly jenkins gate fails | 17:56 |
jeblair | clarkb: i know, it's complicated. there are several places where you could inject an artificial event for testing; i'm assuming you have no scripts that inject events into any such places? :) | 17:56 |
jeblair | notmyname: otherwise we do keep data for a year | 17:57 |
yaguang | for this patch https://review.openstack.org/#/c/61237/ | 17:57 |
clarkb | jeblair: not really no, I typically just run the client and worker locally and hook them up to a jenkins feed. jenkins is busy enough to get events that way :) | 17:57 |
notmyname | jeblair: are there any events generated when the zuul pipeline gets reset? I'd _really_ like to track that number | 17:57 |
jeblair | clarkb: is jenkins zmq public? | 17:57 |
clarkb | jeblair: no, port forwarding is necessary | 17:58 |
clarkb | I may have a stand in client though /me looks | 17:58 |
clarkb | jeblair: I do have a simple stand in client | 17:59 |
clarkb | would you like a copy of that? | 17:59 |
jeblair | notmyname: not atm, however i do think such a thing is possible; probably in zuul.scheduler.Scheduler._processOneItem. | 18:00 |
jeblair | clarkb: that would be lovely | 18:00 |
jeblair | notmyname: if you wanted to hack on zuul :) | 18:00 |
*** gaelL_ has quit IRC | 18:00 | |
notmyname | jeblair: I'll add it to the todo list, but I can't promise it will be near the top | 18:00 |
jeblair | notmyname: note that "resets of head item" and "resets of any item" are probably both interesting and distinct | 18:00 |
*** gaelL has joined #openstack-infra | 18:01 | |
jeblair | notmyname: if you don't get to it, i will eventually. | 18:01 |
notmyname | jeblair: with that number (and I'm guessing it will be high since the overall chance of success is so low), I think you can get a good feel for the value of the pipeline approach. I suspect that the current pipeline isn't doing much besides keeping the DC warm | 18:01 |
*** gyee has joined #openstack-infra | 18:01 | |
clarkb | jeblair: http://paste.openstack.org/show/54967/ super simple | 18:02 |
clarkb | it provides only the necessary subset of event data that the gearman worker relies on | 18:02 |
jeblair | notmyname: not sure what you mean by 'pipeline approach'? | 18:02 |
clarkb | stopping jenkins now | 18:02 |
clarkb | * on jenkins-dev | 18:02 |
*** yolanda has quit IRC | 18:04 | |
fungi | yaguang: https://review.openstack.org/55939 only just merged a few hours ago to address the iso8601 issues preventing grizzly integration testing, so this is probably a new bug which was being hidden by that one | 18:04 |
notmyname | jeblair: optimistically queueing all the patches rather than doing them serially. | 18:04 |
*** freyes has joined #openstack-infra | 18:05 | |
*** harlowja has quit IRC | 18:05 | |
*** sandywalsh has quit IRC | 18:05 | |
fungi | i think as long as we manage to merge 1.5 changes an hour on average, the pipeline is going at least as fast as serial testing would | 18:05 |
yaguang | fungi, yes, the iso8601 issue disappeared | 18:05 |
*** sandywalsh has joined #openstack-infra | 18:05 | |
yaguang | fungi, it seems there is a new one + sudo chown -R jenkins /opt/stack/new/savanna-dashboard | 18:06 |
yaguang | 2013-12-13 16:33:06.574 | + cd /opt/stack/new/requirements | 18:06 |
yaguang | 2013-12-13 16:33:06.597 | + python update.py /opt/stack/new/savanna-dashboard | 18:06 |
yaguang | 2013-12-13 16:33:06.598 | Traceback (most recent call last): | 18:06 |
yaguang | 2013-12-13 16:33:06.598 | File "update.py", line 94, in <module> | 18:06 |
yaguang | 2013-12-13 16:33:06.599 | main(sys.argv[1:]) | 18:06 |
yaguang | 2013-12-13 16:33:06.621 | File "update.py", line 90, in main | 18:06 |
yaguang | 2013-12-13 16:33:06.621 | _copy_requires(req, argv[0]) | 18:06 |
yaguang | 2013-12-13 16:33:06.622 | File "update.py", line 71, in _copy_requires | 18:06 |
yaguang | 2013-12-13 16:33:06.622 | dest_reqs = _parse_reqs(dest_path) | 18:06 |
yaguang | 2013-12-13 16:33:06.623 | File "update.py", line 49, in _parse_reqs | 18:06 |
jeblair | notmyname: ah, yes; i'd refer to that as speculative execution. but yes, as the test-subject system's reliability decreases it degrades to its worst-case behavior which is serial merging. | 18:06 |
yaguang | 2013-12-13 16:33:06.651 | pip_requires = open(filename, "r").readlines() | 18:06 |
yaguang | 2013-12-13 16:33:06.676 | IOError: [Errno 2] No such file or directory: '/opt/stack/new/savanna-dashboard/tools/pip-requires' | 18:06 |
yaguang | 2013-12-13 16:33:06.751 | Process leaked file descriptors. See http://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for more information | 18:06 |
yaguang | 2013-12-13 16:33:07.287 | Build step 'Execute shell' marked build as failure | 18:06 |
fungi | yaguang: please use http://paste.openstack.org/ in the future | 18:06 |
clarkb | fungi: 1.543 is running on jenkins-dev. Want to give nodepool a sping with the correct NODEPOOL_SSH_KEY value? | 18:07 |
notmyname | fungi: rounding up from the current status to assume a 70% chance of failing, that means that a queue of 10 patches has a 2.8% chance of not being reset (ie the 10th patch has a 2.8% chance of landing) | 18:07 |
fungi | yaguang: grizzly was broken for so long that there are likely to be new external/dependency-related issues which crept in during that span | 18:07 |
jeblair | notmyname: i think we have the configuration structured so that it shouldn't be _worse_ than serial merging. but yes, it's, um, providing some load for our providers. | 18:07 |
fungi | notmyname: makes sense. just pointing out that if the average duration of our longest tests is 0.75 hours then serial testing is only going to merge 1.5 changes an hour best case (assuming none fail) | 18:08 |
notmyname | current queue depth of 34 means a 0.00077% chance of landing | 18:08 |
fungi | 0.00077% chance of landing on that iteration, but it will be automatically retried until there are no failures ahead of it in the pipeline | 18:09 |
yaguang | fungi, to debug the issue, where can I find the source code for check-requirements-integration-dsvm gate ? | 18:09 |
*** dolphm has quit IRC | 18:09 | |
*** dolphm has joined #openstack-infra | 18:10 | |
notmyname | fungi: right. I'm saying that the last item in the queue pretty much doesn't stand a chance of getting through without a retry | 18:10 |
fungi | yaguang: in openstack-dev/pbr, openstack-infra/pypi-mirror and openstack-infra/config. i'll get you urls to the relevant files in just a momenty | 18:10 |
notmyname | fungi: and the result is that there is a pretty low chance of doing much more than "serial speed", but now we have a bunch of servers wasting cycles for tests on patches that won't land | 18:11 |
*** jpeeler has quit IRC | 18:11 | |
*** dolphm has quit IRC | 18:11 | |
fungi | notmyname: agreed--there may be a numbers game to determining a sweet spot for maximum pipeline length beyond which it makes no sense to start jobs until you get closer to the head of the gate | 18:11 |
*** dolphm_ has joined #openstack-infra | 18:11 | |
notmyname | fungi: right | 18:11 |
*** jpeeler has joined #openstack-infra | 18:12 | |
*** dolphm_ has quit IRC | 18:12 | |
fungi | dependent on the current/recent average failure rate for changes | 18:12 |
jeblair | notmyname: one thing that has been considered is allowing elastic-recheck to see non-final job results to collect more data on bug frequency | 18:12 |
jeblair | notmyname: that is a potential use for the otherwise discarded test runs further down the queue | 18:12 |
notmyname | jeblair: that would be good. it would magnify problem areas | 18:12 |
fungi | i suspect the slave discard/rebuild overhead places the sweet spot somewhere in the vicinity of 50-100% use of the maximum pool size/quota aggregate too | 18:14 |
notmyname | jeblair: fungi: but the real source of the problem is that even a 5% pass rate drop has a _massive_ effect on the efficiency of the overall gate queue. the 34th item in the queue only has an 18.4% chance of landing with no retries even if the pass rate is 95% (as opposed to the current <70%) | 18:14 |
fungi | since that eliminates nodepool's ability to get ahead of the node demand | 18:14 |
*** alcabrera is now known as alcabrera|afk | 18:14 | |
notmyname | and I don't think this is big revelation to anyone, but it's at lest a new way to see and track the data | 18:15 |
jeblair | notmyname: yep; that's the impetus behind jog0's effort to try to get on gate bugs early. | 18:15 |
jeblair | notmyname: ++ more visibility | 18:15 |
jeblair | okay, back to skynet for me | 18:16 |
*** freyes has quit IRC | 18:16 | |
*** ruhe has quit IRC | 18:16 | |
*** matel has quit IRC | 18:16 | |
clarkb | fungi: ready to nodepool on jenkins-dev? | 18:17 |
fungi | clarkb: we can. gimme just a minute | 18:17 |
clarkb | no rush, ping me when I should pay attention | 18:17 |
fungi | yaguang: the meat of that job is this script... https://git.openstack.org/cgit/openstack-dev/pbr/tree/tools/integration.sh | 18:18 |
yaguang | fungi, many thanks :) | 18:18 |
fungi | yaguang: the run-mirror command it's using to test building the set is http://git.openstack.org/cgit/openstack-infra/pypi-mirror/tree/pypi_mirror/cmd/run_mirror.py | 18:19 |
*** basha has quit IRC | 18:19 | |
fungi | yaguang: the job definition (the entry point for jenkins) can be seen at http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/jenkins_job_builder/config/requirements.yaml#n1 | 18:20 |
fungi | yaguang: and that job is actually running the integration test script within the context of these https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/ | 18:21 |
*** Ryan_Lane has quit IRC | 18:22 | |
fungi | so i guess four relevant projects involved in running that | 18:22 |
*** gyee has quit IRC | 18:22 | |
fungi | (not to mention the openstack/requirements project itself, where the projects list and global requirements reside_ | 18:22 |
yaguang | may be some projects doesn't have pip-requires file at initial time | 18:23 |
yaguang | cloning savanna-dashboard | 18:24 |
yaguang | fungi, thanks a lot for the info | 18:24 |
*** harlowja has joined #openstack-infra | 18:24 | |
fungi | yaguang: actually, i think this may be more involved. we lack a lot of the mechanisms for requirements testing in grizzly since they were developed in the havana cycle and not all backported. we may be better off setting that job to only run on stable/havana and later for now | 18:25 |
fungi | mordred: ^ opinion? | 18:25 |
*** basha has joined #openstack-infra | 18:26 | |
fungi | yaguang: i have an outstanding change to backport some of that and try to get it working, but it was waiting on the iso8601 situation to clear up. at the moment i don't have any reasonable expectation that job will run correctly at all | 18:26 |
fungi | i'll propose a change real quick to exclude stable/grizzly until more of those bits are in place (though we're getting close enough to eol for that release that it may not make sense to invest much more time in requirements consistency there anyway) | 18:27 |
yaguang | fungi, I also have some backports are blocked for a long time | 18:28 |
clarkb | fungi: I would go along with that | 18:29 |
*** rwsu has quit IRC | 18:32 | |
harlowja | qq, are stackforge gate/merge jobs currently disabled? | 18:32 |
harlowja | wondering if i should kick https://review.openstack.org/#/c/60850/ to try to get it to move, or not worry about it yet | 18:33 |
*** mgagne1 has joined #openstack-infra | 18:33 | |
*** mgagne1 has quit IRC | 18:33 | |
*** mgagne1 has joined #openstack-infra | 18:33 | |
*** mgagne has quit IRC | 18:34 | |
*** mgagne has joined #openstack-infra | 18:34 | |
*** mgagne has quit IRC | 18:34 | |
*** mgagne has joined #openstack-infra | 18:34 | |
*** yaguang has quit IRC | 18:34 | |
fungi | harlowja: i'm not immediately seeing a good reason for 60850 not to be in progress on http://status.openstack.org/zuul/ so it may merit further investigation | 18:35 |
harlowja | k, another one of interest, http://logs.openstack.org/20/54220/36/check/gate-taskflow-pep8/444a457/console.html | 18:35 |
fungi | harlowja: there's nothing special going on for stackforge... it's treated the same as far as whether and when gating is started | 18:35 |
*** jergerber has joined #openstack-infra | 18:35 | |
harlowja | kk, thx fungi | 18:35 |
harlowja | fungi should i try to kick those (recheck no bug) or just leave them for a little? | 18:37 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Don't run requirements integration for Grizzly https://review.openstack.org/62055 | 18:37 |
fungi | harlowja: the failure in the log you linked can be rechecked or reverified against bug 1260654 | 18:38 |
uvirtbot | Launchpad bug 1260654 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [Critical,Fix released] https://launchpad.net/bugs/1260654 | 18:38 |
harlowja | k, thx fungi | 18:38 |
fungi | precise14 was not a happy camper this morning | 18:38 |
*** mgagne1 has quit IRC | 18:38 | |
*** basha has quit IRC | 18:38 | |
harlowja | :) | 18:39 |
fungi | harlowja: on 60850 i think you *may* have originally set approval without a +2 vote and then added the +2 vote after, which might be the reason. try removing and adding your approval on it | 18:40 |
harlowja | kk | 18:40 |
*** rwsu has joined #openstack-infra | 18:40 | |
*** zehicle_at_dell has quit IRC | 18:41 | |
*** gyee has joined #openstack-infra | 18:42 | |
*** herndon has joined #openstack-infra | 18:43 | |
*** johnthetubaguy has quit IRC | 18:44 | |
fungi | harlowja: looks like you undid your +2 code review on it (which you should add back) but did not remove your +1 approve (which is the one you need to reapply for zuul to notice) | 18:44 |
harlowja | ah | 18:44 |
harlowja | wrong one, thx | 18:44 |
fungi | glad to help | 18:44 |
harlowja | need more coffee, ha | 18:44 |
*** basha has joined #openstack-infra | 18:45 | |
*** rossella_s has joined #openstack-infra | 18:45 | |
*** zehicle_at_dell has joined #openstack-infra | 18:46 | |
fungi | harlowja: however, that theory didn't pan out since it's still not being tested. i think it may be because you have a draft change indirectly depending on it (61689), and i think we may still have corner case bug where if zuul can't retrieve and inspect the entire chain of dependent and reverse-dependent changes, it doesn't enqueue | 18:49 |
harlowja | ah | 18:49 |
fungi | zuul, like the rest of the general public, is blind to draft changes | 18:50 |
clarkb | fungi: this is where you say "don't use drafts" :) | 18:50 |
fungi | (one of the reasons we recommend against the draft feature) | 18:50 |
fungi | heh | 18:50 |
^d | Ugh, drafts. | 18:50 |
harlowja | ya, and 61689 seems hidden | 18:50 |
harlowja | hmmm | 18:50 |
*** rcarrillocruz1 has joined #openstack-infra | 18:51 | |
*** herndon has quit IRC | 18:51 | |
harlowja | so if that draft is not a draft but a WIP that should solve this? | 18:51 |
*** praneshp has joined #openstack-infra | 18:51 | |
clarkb | yes | 18:51 |
harlowja | k | 18:51 |
*** rcarrillocruz has quit IRC | 18:52 | |
harlowja | i think i know who owns that draft, will bug him | 18:52 |
fungi | harlowja: it should, yes, though one of the patches in the set will need reapproval again probably after you publish that draft | 18:52 |
fungi | so that zuul will notice | 18:52 |
*** mriedem has quit IRC | 18:52 | |
harlowja | k | 18:53 |
harlowja | thx guys | 18:53 |
*** apevec has joined #openstack-infra | 18:53 | |
fungi | the last time i looked into one of these, i found a traceback in zuul's log from where it tried to retrieve a reverse-dependent change which was in a draft state, and failed to enqueue the non-draft parent change as a result | 18:53 |
fungi | can't remember if i filed a bug or not | 18:53 |
harlowja | seems like it should almost skip over drafts completly | 18:53 |
apevec | mordred, https://review.openstack.org/61237 (grizzly reqs) failed on savanna, but savanna doesn't have grizzly branch afaict? | 18:54 |
fungi | well, it can't have any hope of skipping them if they're required for the change in question, but if they're merely draft changes requiring the non-draft change that seems like one we could do something about | 18:54 |
fungi | apevec: https://review.openstack.org/62055 | 18:55 |
apevec | what's requirements-integration test doing ? | 18:55 |
apevec | fungi, ah thanks | 18:55 |
clarkb | dkranz: re https://review.openstack.org/#/c/61850/ were my suggestions in patchset 2 not good? (I think the logic in patchset 3 is much more complicated than it needs to be) | 18:55 |
harlowja | fungi agreed, for reviews that are dependent on a draft, ya, nothing u can do, but the other way around (a draft dependent on a review) seems like u could just ignore that draft (and all its dependents, if any) | 18:56 |
*** basha has quit IRC | 18:56 | |
fungi | apevec: i think that job got added while grizzly was broken from iso8601 so we didn't think about the implications on that branch | 18:56 |
clarkb | harlowja: the greater problem is that drafts just don't work | 18:56 |
harlowja | or that clarkb :) | 18:56 |
clarkb | there are so many corner cases where they fall over. It isn't just zuul having a hard time | 18:57 |
fungi | the worst part of gerrit drafts, in my opinion, is that as the gerrit server admin you can't even disable them | 18:57 |
harlowja | easy/hard to remove draft feature complety? | 18:57 |
harlowja | ah | 18:57 |
fungi | if it were a config option, i wouldn't care | 18:57 |
apevec | fungi, thanks, I've added comment in the review to prevent rechecks in vain | 18:57 |
fungi | instead it's baked in, non-optional and thus an attractive nuisance | 18:58 |
harlowja | fungi agreed | 18:58 |
fungi | also, the idea of stashing "hidden" changes in progress in the code review system runs pretty counter to what i think open development processes are all about | 18:59 |
*** alcabrera|afk is now known as alcabrera | 18:59 | |
*** yamahata_ has quit IRC | 18:59 | |
fungi | apevec: i think that requirements change is also probably not really absolutely necessary, since i'm pretty sure we don't do requirements enforcement on stable branches anyway (at least not for grizzly but i think also still not for havana either) | 19:00 |
*** mrodden has quit IRC | 19:00 | |
apevec | fungi, true, but it'd be nice to keep new updates somewhat synced | 19:01 |
fungi | (though the havana ones do need to get enforced. on my to do list to check back into the state of those) | 19:01 |
fungi | apevec: agreed | 19:02 |
clarkb | fungi: I am fiddling with passing NODEPOOL_SSH_KEY into the daemon env in the init script, but I am probably better off patching nodepool to accept that key as an option | 19:05 |
fungi | k. i'm free to help as soon as i finish filing this zuul bug i should have filed ages ago | 19:06 |
clarkb | awesome | 19:06 |
*** dstanek has quit IRC | 19:06 | |
clarkb | I am still reading source to try and figure out how this best fits in | 19:06 |
anteaya | any point in adding a comment in git review so that if someone does use the flag to submit a draft they know they are creating a painful situation? | 19:07 |
clarkb | half tempted to dump the public key literally into the yaml config file and just read it there | 19:07 |
anteaya | like "are you sure you want to create a draft? This will bite you later." | 19:07 |
clarkb | but then you have to sort out logic like kicking off image rebuilds if the config changes which I don't think exists today | 19:07 |
fungi | clarkb: it's not as if we don't do similar things elsewhere (though the path to a keyfile would certainly be nicer) | 19:08 |
notmyname | anteaya: draft == WIP status? | 19:08 |
anteaya | notmyname: no, draft means only certain folks can see it | 19:08 |
notmyname | ah | 19:08 |
anteaya | WIP is a button in the gui for the patch | 19:08 |
pleia2 | better to use wip than draft | 19:08 |
*** mrodden has joined #openstack-infra | 19:08 | |
anteaya | submit and then push "work in progress" | 19:08 |
clarkb | except it isn't completely private when you draft, anyone can still fetch the code if they are smart | 19:09 |
anteaya | pleia2: yes | 19:09 |
fungi | we need to add a wip flag back into git-review but have been holding off until we see the state in gerrit 2.9 | 19:09 |
anteaya | kk | 19:09 |
anteaya | notmyname: and draft makes future operations on that patch a pain | 19:09 |
fungi | since the wip feature we're using now exists only in our own fork of gerrit 2.4 | 19:09 |
anteaya | which I believe was what was being discussed above | 19:09 |
*** mriedem has joined #openstack-infra | 19:10 | |
*** mgagne has quit IRC | 19:17 | |
sdague | so one place where I think it would be good for infra to auto recheck would be when any of the test results come back as UNSTABLE | 19:18 |
sdague | as that clearly was an infra fail | 19:18 |
*** sandywalsh has quit IRC | 19:19 | |
fungi | clarkb: i assume it's safe to blow away the bad images and nodes from last night's nodepool-dev experiments | 19:19 |
*** rongze has quit IRC | 19:20 | |
*** dims has quit IRC | 19:21 | |
clarkb | fungi: yup should be | 19:22 |
clarkb | they weren't used for anything | 19:22 |
clarkb | sdague: do we still have instances of UNSTABLE jobs making it to reporting? I think the problem there is that when zuul cancells jobs intentionally they sometimes report back as UNSTABLE | 19:23 |
*** mgagne has joined #openstack-infra | 19:23 | |
*** mgagne has quit IRC | 19:23 | |
*** mgagne has joined #openstack-infra | 19:23 | |
clarkb | but I suppose in those cases we would know why | 19:23 |
sdague | clarkb: https://review.openstack.org/#/c/61778/4 just got hit by it | 19:23 |
*** sharwell has joined #openstack-infra | 19:23 | |
sdague | because basically UNSTABLE is completely unuseful to a person, because it means there typically aren't any logs. So the only option is recheck no bug anyway | 19:24 |
clarkb | sdague: well we should always have the console log... | 19:25 |
*** CaptTofu has joined #openstack-infra | 19:25 | |
clarkb | but it is usually an infra problem | 19:25 |
sdague | clarkb: sometimes we don't | 19:25 |
fungi | right, depends on how long it's been | 19:25 |
jeblair | sdague: are you talking about errors from the bad jenkins slaves earlier? | 19:27 |
fungi | the other problem there is that jenkins will persist jobs to the same slaves if available | 19:27 |
sdague | jeblair: that might be what this was | 19:27 |
jeblair | fungi: was precise20 one of those? | 19:28 |
sdague | just trying to think about improvements to the system | 19:28 |
sdague | yes it was | 19:28 |
fungi | jeblair: yep | 19:28 |
fungi | these are things i expect will get better once we no longer run tests on long-lived slaves and use nodepool. not too much longer now | 19:28 |
*** jaypipes has joined #openstack-infra | 19:28 | |
jeblair | sdague: so zuul does re-launch jobs when it detects some kinds of jenkins failures | 19:29 |
sdague | jeblair: ok, so maybe expand that? | 19:29 |
jeblair | sdague: though obviously this isn't one of them. it's possible that retrying on unstable for this error would have made things better, inasmuch as it may have eventually been assigned to a node other than precise20 (possibly after retrying 200 times or something because of what fungi just pointed out) | 19:30 |
*** yassine has quit IRC | 19:30 | |
jeblair | sdague: but often retrying on unstable results isn't going to get us anywhere, and may make things worse (logs.o.o full as an example) | 19:30 |
fungi | http://logs.openstack.org/55/62055/1/check/gate-config-layout/5ad5222/console.html "Building remotely on bare-precise-rax-ord-850570..." | 19:31 |
jeblair | sdague: so i don't think that build result alone is enough to automate a retry on | 19:31 |
sdague | jeblair: so classifying the kind of problem is probably important | 19:31 |
sdague | honestly, an exception like that should down that node | 19:31 |
jeblair | sdague: i agree; i think that's a jenkins bug.... | 19:31 |
fungi | i bet retrying an unstable job once we're using bare nodepool nodes for them will be slightly more effective | 19:31 |
sdague | about every 3 weeks we have a node go wonky like that and destroy an entire development day for .eu | 19:31 |
sdague | because there is no one to solve that in that TZ | 19:32 |
jeblair | sdague: but we think that going to all-dynamic slaves will solve this problem | 19:32 |
sdague | jeblair: ok, well if that's close, cool | 19:32 |
fungi | sdague: see the link i posted | 19:33 |
fungi | we're already doing it some | 19:33 |
jeblair | sdague: it's very much in progress ^ :) | 19:33 |
sdague | jeblair: ok cool | 19:33 |
fungi | i did mention that in the bug when i resolved it as well | 19:34 |
sdague | every time we have a 'slode I just like to figure out "how does this problem never happen again" | 19:34 |
jeblair | sdague: this is a unit-test like job that ran on one: http://logs.openstack.org/54/61954/2/check/gate-config-layout/0702552/console.html | 19:34 |
sdague | fungi: sure, I guess timeline was the question | 19:34 |
jeblair | sdague: yes, me too. sometimes that involves a long multi-step process. fortunately we're near the end of this one. | 19:35 |
sdague | cool | 19:35 |
clarkb | the slave threading should help with this too | 19:35 |
fungi | i hope so | 19:36 |
sdague | yeh, it's just been a very bad gate week, and only slightly related to actual openstack bugs :) | 19:36 |
clarkb | I think errors like this are jenkins being unable to maintain 300 ssh connections with thousands of threads all vying for cpu cycles | 19:36 |
*** dims has joined #openstack-infra | 19:36 | |
fungi | i suspect precise14 and precise20 got into a bad state after we restarted jenkins02 (timeline seems about right) but wasn't obvious until we'd all gone to sleep | 19:37 |
jeblair | sdague: to be fair, i think the 30% failure rate in openstack is more than a slight contribution. | 19:37 |
*** talluri has quit IRC | 19:37 | |
jeblair | fungi, clarkb: they never recover, so i think it's more than just contention. | 19:37 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/nodepool: Allow for ssh key file path in config. https://review.openstack.org/62066 | 19:38 |
*** talluri has joined #openstack-infra | 19:38 | |
jeblair | clarkb: do you need that in asap? ^ | 19:38 |
clarkb | jeblair: I don't think so | 19:38 |
clarkb | jeblair: we will get by running nodepool in the foreground for now | 19:38 |
praneshp | hey all, was any of you able to run the docs test successfully after pinning sphinx<1.2? | 19:39 |
clarkb | praneshp: yes | 19:39 |
fungi | clarkb: we're also back to a clean slate now--old images and nodes deleted successfully | 19:39 |
clarkb | fungi: awesome | 19:39 |
sdague | jeblair: how are you computing that #? because while the SSH race is bad, it's not 30% bad | 19:39 |
clarkb | praneshp: you may need to update tox.ini to disable pip install --pre | 19:39 |
clarkb | praneshp: line 9 of nova's tox.ini does this | 19:40 |
fungi | sdague: i'm guessing it was an instance of http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/5000/600/5652/5652.strip.gif | 19:40 |
praneshp | clarkb thanks. Let me look into my tox.ini | 19:41 |
openstackgerrit | Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067 | 19:41 |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682 | 19:41 |
sdague | fungi: :) | 19:42 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/nodepool: Allow for ssh key file path in config. https://review.openstack.org/62066 | 19:42 |
*** sandywalsh has joined #openstack-infra | 19:43 | |
openstackgerrit | Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067 | 19:43 |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682 | 19:43 |
jeblair | sdague: http://not.mn/gate_status.html | 19:44 |
fungi | yeah, that does seem to average out to about 30% failing | 19:44 |
sdague | jeblair: so that includes all the fails, including the infra fails, which currently are the #1 recheck bug | 19:44 |
fungi | based on recent freshness metrics presumably | 19:45 |
jeblair | sdague: the infra fails are the dip from a few hours ago. | 19:45 |
praneshp | clarkb i don't have a line relating to pup install --pre in my tox.ini https://review.openstack.org/#/c/61615/17/tox.ini | 19:45 |
clarkb | sdague: which bug is that? rechecks page says 1253896 which isn't infra | 19:45 |
clarkb | praneshp: do you have a line like line 9 in nova's tox.ini? | 19:45 |
praneshp | one sec. | 19:46 |
sdague | http://status.openstack.org/elastic-recheck/ | 19:46 |
jeblair | sdague: this chart is based on jog0's chart which, on the right edge measure real-time failure rates of jobs | 19:46 |
praneshp | clarkb nope. | 19:46 |
clarkb | praneshp: that is what you need | 19:46 |
sdague | jeblair: so I'm not actually trying to argue who's more to blame here | 19:46 |
praneshp | ok, let me try, thanks | 19:46 |
sdague | I'm just saying, it's really hard to get anyone to look at the ssh bug when things are exploding for other reasons | 19:46 |
*** rossella_s has quit IRC | 19:47 | |
jeblair | sdague: sure. but you included some hyperbole in your statements that i don't think helped the situation. | 19:47 |
*** zehicle_at_dell has quit IRC | 19:47 | |
openstackgerrit | Matt Farina proposed a change to openstack-infra/config: New project request: PHP-Client https://review.openstack.org/62069 | 19:48 |
clarkb | fungi: were you going to start nodepool in the foreground? | 19:49 |
*** dolphm has joined #openstack-infra | 19:49 | |
clarkb | lunch should be starting here shortly but will do my best ot pay attention | 19:49 |
*** Ryan_Lane has joined #openstack-infra | 19:50 | |
fungi | clarkb: yeah, i was going to try `sudo -u nodepool NODEPOOL_SSH_KEY=~jenkins/.ssh/id_rsa.pub nodepoold -d` in a screen session, but the jenkins public key isn't readable by the nodepool user so i'm pondering options | 19:52 |
anteaya | fwiw, we are working hard on the ssh bug, it is proving to be a tricky one, markmcclain salv-orlando beagles and dkehn are all working on it right now | 19:53 |
anteaya | will update when we have anything | 19:53 |
*** ^d is now known as ^demon|away | 19:53 | |
fungi | clarkb: i think i may just resort to copying it somewhere accessible for now (it's not as if the file's sensitive anyway) | 19:53 |
openstackgerrit | Michael Still proposed a change to openstack-infra/jeepyb: Rename the subscriber map to be a more generic config file. https://review.openstack.org/62073 | 19:54 |
openstackgerrit | Michael Still proposed a change to openstack-infra/jeepyb: Allow configurable mappings to different LP projects https://review.openstack.org/62074 | 19:54 |
clarkb | fungi: ++ not sensitive | 19:54 |
fungi | okay, it's running | 19:55 |
clarkb | fungi: I think the var needs to have the actual file contents | 19:55 |
*** CaptTofu has quit IRC | 19:55 | |
clarkb | the path won't work there | 19:55 |
jeblair | mriedem: see my comment in https://bugs.launchpad.net/tempest/+bug/1258682 | 19:55 |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] | 19:55 |
fungi | ohhhhhhhh | 19:55 |
* fungi totally misread it | 19:56 | |
jeblair | mriedem: not all timeouts are due to the same cause. | 19:56 |
fungi | tanks clarkb | 19:56 |
clarkb | fungi: it is passed literally to puppet on the remote end | 19:56 |
fungi | mmm, nodepool is also like the honey badger when it comes to trapping sigint, i see | 19:57 |
jeblair | mriedem: however, i know of no current infra issues that would contribute to timeouts, so i think we can assume that all _current_ timeouts are due to the unknown bug | 19:57 |
mriedem | jeblair: ok, i just pushed an e-r query for it | 19:57 |
jeblair | sdague: ^ this is a big untracked contributer for gate failures | 19:57 |
mriedem | since there are no logs with errors besides console.html, i didn't have much to base the query on | 19:57 |
jeblair | sdague: that makes things worse by taking 45 min jobs to 90 mins | 19:57 |
clarkb | fungi: ya :( will probably need to delete the image build stuff too | 19:57 |
mriedem | jeblair: https://review.openstack.org/#/c/62067/ | 19:58 |
fungi | clarkb: i plan to | 19:58 |
*** dstanek has joined #openstack-infra | 19:59 | |
sdague | mriedem: can yuo change the message part to | 19:59 |
fungi | clarkb: cleaned up... so how about: sudo -u nodepool NODEPOOL_SSH_KEY="`cat /tmp/id_rsa.pub`" nodepoold -d | 19:59 |
fungi | clarkb: is that what you had in mind? | 19:59 |
sdague | message:"Build timed out (after" AND message:"minutes). Marking the build as failed." | 19:59 |
*** jaypipes has quit IRC | 19:59 | |
sdague | so it catches all the job timeouts, not just the ones set to 90 minutes | 20:00 |
*** dcramer_ has quit IRC | 20:00 | |
clarkb | fungi: ya, see the nodepool readme, that is basically what it does | 20:00 |
clarkb | sdague: that query is even better than mine :) | 20:00 |
jeblair | sdague, mriedem: we'll want to remove the query asap after fixing the bug too, because lots of people upload broken code that times out | 20:00 |
fungi | clarkb: founf it. thanks | 20:01 |
fungi | er, found | 20:01 |
sdague | so 75 hits over 7 days actually makes it 9th in the e-r bug list (by count) | 20:02 |
sdague | just to get a sense of relative frequency | 20:02 |
*** lcestari has quit IRC | 20:02 | |
fungi | clarkb: for the benefit of our sanity, i checked the log and nodepool *does* think it needs two nodes, so could be an off-by-one/rounding error, or maybe that's an effect of the load prediction heuristic | 20:03 |
clarkb | fungi: so if I sudo nodepool list I should see the data from the foreground process right? since this is all in the DB | 20:03 |
clarkb | fungi: weird | 20:03 |
fungi | clarkb: image-list at the moment | 20:03 |
fungi | clarkb: list will start showing content once the image finished building | 20:03 |
clarkb | fungi: I wonder if the heuristic will always do one + what it determines | 20:04 |
clarkb | or some other silly off by one error | 20:04 |
fungi | clarkb: i also have both commands being called every 60 seconds under watch in the second window of that screen session | 20:04 |
mriedem | sdague: yeah, i can update the message | 20:04 |
fungi | clarkb: wild guess would be that it's rounding up from very small values of 1 ;) | 20:05 |
*** eharney has quit IRC | 20:05 | |
* fungi has not looked back at that section of the code to make a more reasoned guess | 20:06 | |
jeblair | sdague: yeah, just pointing out that i think the 2x runtime factor aggravates its severity (when it hits, it has the same throughput effect of hitting twice in a row). | 20:06 |
openstackgerrit | Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067 | 20:06 |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682 | 20:06 |
sdague | yep, definitely | 20:06 |
jeblair | clarkb, fungi: is there something i can help elucidate? | 20:06 |
*** harlowja has quit IRC | 20:07 | |
sdague | also, the folks trying to land stable/grizzly patches that didn't fix their doc jobs is a huge problem now as well | 20:07 |
fungi | jeblair: min-ready is set to 1 and nodepool believes it needs 2 nodes | 20:07 |
mriedem | 219 hits > 77 hits: | 20:07 |
mriedem | http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQnVpbGQgdGltZWQgb3V0IChhZnRlclwiIEFORCBtZXNzYWdlOlwibWludXRlcykuIE1hcmtpbmcgdGhlIGJ1aWxkIGFzIGZhaWxlZC5cIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4Njk2NTA5OTI1NX0= | 20:07 |
sdague | as a neutron job will reset ahead of them, then they'll be put back into the zuul queue | 20:07 |
mriedem | good call sda | 20:07 |
mriedem | sdague: * | 20:07 |
clarkb | jeblair: at this point I don't think so | 20:07 |
anteaya | sdague: the neutron job being the ssh bug? | 20:07 |
sdague | anteaya: the ssh bug that I pointed as top bug yesterday | 20:08 |
anteaya | yes | 20:08 |
fungi | clarkb: jeblair: i'm less concerned with nodepool math problems at the moment and just want to make sure we have all the moving parts in place on jenkins-dev | 20:08 |
anteaya | which 4 devs are working on now | 20:08 |
sdague | mriedem: actually, that's catching a ton of swift issues | 20:08 |
sdague | in their unit tests | 20:08 |
anteaya | continuing from yesterday | 20:08 |
sdague | so I'm not sure that was a good call | 20:08 |
jeblair | fungi: but i'm curious, what's the math problem? | 20:08 |
notmyname | sdague: ? (swift ping) | 20:08 |
fungi | jeblair: min-ready is set to 1 and nodepool believes it needs 2 nodes | 20:09 |
fungi | jeblair: with no jobs underway on jenkins-dev | 20:09 |
jeblair | fungi: can i see the debug output from the allocator? | 20:09 |
mriedem | sdague: clarkb: like this: http://logs.openstack.org/15/60215/2/check/gate-swift-python26/1b3754e/console.html | 20:09 |
fungi | jeblair: probably so. i'll fish it out | 20:09 |
fungi | jeblair: scratch that | 20:09 |
mriedem | http://logs.openstack.org/15/60215/2/check/gate-swift-python26/1b3754e/console.html#_2013-12-13_19_31_29_115 | 20:09 |
sdague | mriedem: yes | 20:09 |
*** dprince has quit IRC | 20:10 | |
fungi | jeblair: clarkb: jobs are actually underway on jenkins-dev, just none i would expect to need these nodepool nodes | 20:10 |
fungi | anyway, getting debug output | 20:10 |
mriedem | sdague: so going back to the strict message i had first | 20:10 |
sdague | mriedem: yeh | 20:10 |
praneshp | hey clarkb: thanks a lot, my review passed jenkins | 20:10 |
praneshp | *patch | 20:10 |
mfer | clarkb any chance i could get you to look at https://review.openstack.org/#/c/62069/ ... or maybe there's someone else i could hit up | 20:10 |
clarkb | persia: np | 20:10 |
jeblair | mriedem, sdague: current timeout values for d-g jobs are 60,90,120 | 20:11 |
jeblair | mriedem, sdague: we could change them to 61,91,121 for better fingerprinting | 20:11 |
sdague | you could match job name | 20:11 |
clarkb | mfer: currently busy trying ot make jenkins more reliable. also manage-projects is still giving us grief... probably won't get to it today | 20:11 |
jeblair | sdague: oh, right, that's a different field so you can match it. that's better. :) | 20:12 |
sdague | build_name is a valid thing to match | 20:12 |
mfer | clarkb kk | 20:12 |
fungi | jeblair: clarkb: debug log from daemon start to end of demand analysis... http://paste.openstack.org/show/54971 | 20:12 |
fungi | jeblair: clarkb: so i think that's our answer | 20:13 |
clarkb | oh it has jobs queued | 20:13 |
fungi | it wants to run some jobs on them ;) | 20:13 |
clarkb | fungi: that is good, it means we will get end to end testing :) | 20:13 |
fungi | mystery solved | 20:13 |
mriedem | sdague: jeblair: but can you do ORs? | 20:13 |
clarkb | mriedem: yes | 20:13 |
sdague | mriedem: yes | 20:14 |
sdague | and you can use parens to group | 20:14 |
mriedem | can i use ternary operators? :) | 20:14 |
jeblair | fungi: cool. error: situation normal. :) | 20:14 |
fungi | clarkb: i'm going to clear old nodepool nodes manually out of jenkins-dev too | 20:14 |
clarkb | fungi: o | 20:14 |
fungi | jeblair: yes, very much so | 20:14 |
clarkb | *ok | 20:14 |
clarkb | mriedem: http://lucene.apache.org/core/2_9_4/queryparsersyntax.html | 20:15 |
clarkb | that is for an older version of lucene but I think the query syntax hasn't changed | 20:15 |
*** Abhishe__ has quit IRC | 20:15 | |
*** UtahDave1 has joined #openstack-infra | 20:16 | |
*** mrodden has quit IRC | 20:16 | |
*** UtahDave has quit IRC | 20:17 | |
*** UtahDave1 is now known as UtahDave | 20:17 | |
fungi | clarkb: watching jenkins-dev, i think we still have some old periodic jobs which need to be deleted from it | 20:17 |
Alex_Gaynor | dhellmann: Good catch -- I totally missed the existing +2 | 20:17 |
Alex_Gaynor | (and then I missed the follow up comment, not doing real well today am I?) | 20:17 |
clarkb | fungi: probably | 20:17 |
dhellmann | Alex_Gaynor: yeah, I do that sometimes so I figured that's what it was | 20:17 |
fungi | clarkb: particularly the devstack-reap-vms-* jobs and similar | 20:17 |
* fungi fixes | 20:17 | |
fungi | though hopefully they no longer match the new nodepool names | 20:18 |
*** mrodden has joined #openstack-infra | 20:18 | |
mriedem | message:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND (build_name:"check-tempest-dsvm-postgres-full" OR build_name:"check-tempest-dsvm-full") AND filename:"console.html" | 20:19 |
*** zehicle_at_dell has joined #openstack-infra | 20:19 | |
sdague | mriedem: s/check/gate/ ? | 20:20 |
mriedem | sdague: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQnVpbGQgdGltZWQgb3V0IChhZnRlclwiIEFORCBtZXNzYWdlOlwibWludXRlcykuIE1hcmtpbmcgdGhlIGJ1aWxkIGFzIGZhaWxlZC5cIiBBTkQgKGJ1aWxkX25hbWU6XCJjaGVjay10ZW1wZXN0LWRzdm0tcG9zdGdyZXMtZnVsbFwiIE9SIGJ1aWxkX25hbWU6XCJjaGVjay10ZW1wZXN0LWRzdm0tZnVsbFwiKSBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQ | 20:20 |
mriedem | sdague: should i duplicate the build_names in the query for check and gate? | 20:21 |
mriedem | otherwise the e-r query won't hit on check failures and people will have to hunt for it | 20:21 |
sdague | clarkb: do we have globbing in that field? | 20:21 |
clarkb | sdague: yes, but bot at the beginning of the field (lucene limitation) | 20:21 |
clarkb | also you have to remove the quotes to glob so | 20:22 |
clarkb | check-tempest-* OR gate-tempest-* should work | 20:22 |
clarkb | I wish I could will the image build to go faster :) | 20:22 |
mriedem | message:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND (build_name:check-tempest-* OR build_name:gate-tempest-*) AND filename:"console.html" | 20:23 |
sdague | message:"Build timed out (after" AND message:"minutes). Marking the build as failed." AND filename:"console.html" AND (build_name:gate-tempest* OR build_name:check-tempest*) | 20:23 |
mriedem | essentially the same | 20:23 |
sdague | yeh, that | 20:23 |
mriedem | i'm back to my original number of hits | 20:23 |
mriedem | so looks good | 20:23 |
sdague | yep, we were going at it the same time | 20:23 |
sdague | yep, looks solid | 20:24 |
sdague | push that and I'll land it | 20:24 |
sdague | only 5 hits in the gate | 20:24 |
sdague | which is nice | 20:24 |
*** jcooley_ has quit IRC | 20:24 | |
sdague | so it's not actually reseting much | 20:24 |
clarkb | oh grenade | 20:24 |
clarkb | should add grenade beacyse that is timing out a bunch right> | 20:25 |
openstackgerrit | Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067 | 20:25 |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682 | 20:25 |
*** zehicle has joined #openstack-infra | 20:26 | |
jeblair | mriedem: ^ see clarkb's comment | 20:28 |
sdague | so I'm landing mriedem's current patch, but a follow up to add would be accepted | 20:28 |
*** zehicle_at_dell has quit IRC | 20:29 | |
mriedem | check-grenade-* and gate-grenade-* right? | 20:29 |
mordred | backscroll! | 20:30 |
mordred | also | 20:30 |
mordred | the internet works | 20:30 |
mordred | I can type | 20:30 |
mordred | so happy | 20:30 |
mordred | morning everyone | 20:30 |
*** rcarrillocruz has joined #openstack-infra | 20:30 | |
clarkb | mordred: good morning | 20:30 |
clarkb | fungi: we have an image id! | 20:30 |
mfer | mordred good morning | 20:30 |
*** rongze has joined #openstack-infra | 20:31 | |
clarkb | fungi: almost done building I think | 20:31 |
*** rcarrillocruz1 has quit IRC | 20:31 | |
sdague | mriedem: yes | 20:31 |
anteaya | morning mordred | 20:32 |
sdague | also, just a style thing, I've been putting the conjunctions after the break | 20:32 |
*** Ryan_Lane has quit IRC | 20:33 | |
anteaya | looking at the gerrit merge log, once 24 hours has passed is there a way of seeing the merges that happened at a specific timestamp | 20:33 |
openstackgerrit | Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add grenade jobs to the bug 1258682 e-r query https://review.openstack.org/62084 | 20:33 |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682 | 20:33 |
anteaya | or at least to the closest hour? | 20:33 |
mriedem | clarkb: sdague: https://review.openstack.org/62084 | 20:33 |
anteaya | once 000 utc occurs everything is just attributed to the same date | 20:33 |
jeblair | anteaya: you can use an ssh query | 20:33 |
jeblair | anteaya: or the git log. or the git log for openstack/openstack. | 20:33 |
anteaya | okay thanks | 20:34 |
sdague | mriedem: landed | 20:34 |
*** gyee has quit IRC | 20:35 | |
mordred | jeblair: in the airport this morning, jog0 requested that we add the infra repos to openstack/openstack - I think it might be an interesting idea - possibly in an infra subdir to be clear about what they are | 20:35 |
fungi | anteaya: yes, like i suggested yesterday, you can see it in cgit if you like browsers... http://git.openstack.org/cgit/openstack/oslo.messaging/log/ | 20:36 |
fungi | (otherwise, the git log command) | 20:36 |
jeblair | mordred: why? | 20:36 |
mordred | jeblair: but his specific question was that when he's trying to track down when something started acting wonky, he's been using o/o to walk backwards and look at system state | 20:36 |
anteaya | fungi: yes, thanks | 20:36 |
clarkb | fungi: waiting for the image to leave the building state is like watching paint dry | 20:36 |
*** rongze has quit IRC | 20:36 | |
mordred | jeblair: so knowing what various infra things looked like around the time of commit X was a thing he was looking to be able to do | 20:36 |
clarkb | fungi: just want to ready, we have two slaves building | 20:36 |
fungi | yup | 20:37 |
*** zehicle has quit IRC | 20:37 | |
jeblair | mordred: that has limited ulitily with infra; most of our changes take effect between 10 and 1440 minutes after the commit lands... | 20:37 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Add e-r query for bug 1258682 https://review.openstack.org/62067 | 20:37 |
*** jcooley_ has joined #openstack-infra | 20:37 | |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682 | 20:37 |
mordred | jeblair: yeah - that's what I said - and devstack and devstack-gate are already in there | 20:38 |
mordred | but I guess there are potentially things in config, such as job changes, that might be helpful to look at? I feel non-strongly in either direction | 20:38 |
jeblair | mordred: i don't think that was really the intent behind openstack/openstack (i mean, gee, we could just log gerrit merges if that's what's needed) | 20:39 |
*** jcooley_ has quit IRC | 20:39 | |
*** zehicle_at_dell has joined #openstack-infra | 20:39 | |
openstackgerrit | Michael Still proposed a change to openstack-infra/config: Add project configuration. https://review.openstack.org/62085 | 20:39 |
mordred | jeblair: nod | 20:39 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Add grenade jobs to the bug 1258682 e-r query https://review.openstack.org/62084 | 20:40 |
uvirtbot | Launchpad bug 1258682 in tempest "timeout causing gate-tempest-dsvm-full to fail" [Undecided,Invalid] https://launchpad.net/bugs/1258682 | 20:40 |
jeblair | mordred: so i'm inclined to say "let's not" and if jog0 is extremely persuasive that it's totally useful and he's solved all kinds of problems by having the infra git log on the screen with the openstack git log, maybe we look at doing that or a git merge log thing... | 20:40 |
mordred | jeblair: kk. works for me | 20:40 |
sdague | jeblair: now that you have you awesome priority tool - could you bump this to the top of the queue - https://review.openstack.org/#/c/61428/ ? | 20:42 |
clarkb | fungi: we have slaves! | 20:42 |
sdague | markmcclain thinks that may solve the ssh race (or at least make it a ton better) | 20:42 |
fungi | in jenkins-dev and everything | 20:42 |
clarkb | fungi: they aren't running jobs like I expected though | 20:42 |
fungi | clarkb: but not actually running any jobs | 20:42 |
sdague | basically a neutron + nova change set pair needed to land, the neutron one did, the nova one did not | 20:42 |
fungi | jinx | 20:42 |
*** melwitt has joined #openstack-infra | 20:42 | |
sdague | the massive uptick corresponds to the neutron one landing | 20:43 |
*** eharney has joined #openstack-infra | 20:43 | |
openstackgerrit | Michael Still proposed a change to openstack-infra/jeepyb: Rename the subscriber map to be a more generic config file. https://review.openstack.org/62073 | 20:43 |
openstackgerrit | Michael Still proposed a change to openstack-infra/jeepyb: Allow configurable mappings to different LP projects https://review.openstack.org/62074 | 20:43 |
jeblair | sdague: ack, i'll start on that. | 20:43 |
sdague | it's speculation, but good speculation | 20:43 |
clarkb | fungi: With that in place I am going to grab lunch very quickly | 20:43 |
fungi | sdague: i'm betting it was for the CVE-2013-6419 fix? | 20:44 |
uvirtbot | fungi: ** RESERVED ** This candidate has been reserved by an organization or individual that will use it when announcing a new security problem. When the candidate has been publicized, the details for this candidate will be provided. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-6419) | 20:44 |
fungi | clarkb: k | 20:44 |
sdague | fungi: it's your patch... so you tell me :) | 20:44 |
fungi | sdague: then yes | 20:44 |
clarkb | fungi: maybe we should just manually trigger some jobs on there, I think that will be sufficient for the nodepool node removal stuff to happen | 20:44 |
clarkb | (by mnually I mean via gearman) | 20:44 |
jeblair | zuul promote --pipeline gate --changes 61428,2 | 20:44 |
fungi | sdague: i had to pester the nova devs a but for approval on their half, so it lost a day and the neutron part went in first | 20:45 |
fungi | er, a bit | 20:45 |
fungi | jeblair: magic! | 20:46 |
dkehn | clarkb: quickq: with the cirros-0.3.1-x86_64 images, when trying to ssh into them what is username/password? | 20:46 |
mordred | "clarkb | fungi: we have slaves!" | 20:46 |
mordred | clarkb, fungi: does that mean nodepool static slave replacement? | 20:46 |
fungi | mordred: if only it was meant the way you read it | 20:46 |
fungi | mordred: though yes, we do | 20:46 |
fungi | mordred: there are already several infra jobs dogfooding on the nodepool bare slaves | 20:47 |
*** melwitt has quit IRC | 20:47 | |
fungi | mordred: but we were talking about nodepool dev slaves on jenkins-dev | 20:47 |
jeblair | mordred: fungi and clarkb are working on jenkins-dev; we are using nodepool slaves for some infra jobs, but not generally yet | 20:47 |
mordred | neat | 20:48 |
* mordred is very behind - but thinks everyone is great | 20:48 | |
jeblair | clarkb, fungi, mordred: fyi the zuul promote command waits for the queue to be completely reset before returning; that means it can take a while. | 20:49 |
fungi | jeblair: noted | 20:49 |
mordred | jeblair: thanks | 20:49 |
mordred | jeblair: also, baller command | 20:49 |
*** talluri has quit IRC | 20:50 | |
openstackgerrit | Michael Still proposed a change to openstack-infra/config: Add project configuration. https://review.openstack.org/62085 | 20:50 |
*** esker has joined #openstack-infra | 20:50 | |
fungi | i did like "jump the queue" but shortened to just "jump" it lost a bit of its contextual meaning as a command-line | 20:50 |
*** talluri has joined #openstack-infra | 20:50 | |
jeblair | 6 minutes in this case | 20:50 |
dkranz | clarkb: Sorry, was away. I think your logic was fine and I didn't change it. But unlike previous attempts I tried to run the code locally and got syntax errors that I could not figure out. | 20:51 |
jeblair | fungi: yeah, promote was the only one that read correctly to me as a direct object | 20:51 |
*** vkozhukalov has joined #openstack-infra | 20:51 | |
fungi | wfm | 20:51 |
fungi | more important is that it does what we want, which it seems to | 20:51 |
dkranz | clarkb: So I pushed the same logic using the subset of bash I actually understand. | 20:51 |
fungi | jeblair: clarkb: presumably we should be using a modified nodepool node name for the slaves added to jenkins-dev so we can tell them apart in a nova list more easily? | 20:52 |
dkranz | clarkb: This is an important change so at this point I suggest accepting what I pushed if it is correct, or some one who really knows bash just take over the patch. | 20:52 |
*** melwitt has joined #openstack-infra | 20:53 | |
jeblair | fungi: it would be nice, though that affects jjb and zuul config. not sure the right answer, but i won't be upset if we accidentally delete a dev slave. | 20:54 |
fungi | jeblair: okay, i won't worry too much about it for now | 20:54 |
*** talluri has quit IRC | 20:54 | |
fungi | and yeah, the stability of these slaves is beneath concern | 20:55 |
*** harlowja has joined #openstack-infra | 20:56 | |
*** sdake_ is now known as sdake-OOO | 20:56 | |
*** sdake is now known as sdake-OOO2 | 20:57 | |
*** dolphm has quit IRC | 20:59 | |
*** zehicle_at_dell has quit IRC | 20:59 | |
jeblair | we should really get rid of gate-noop before going to all-dynamic slaves | 21:00 |
dkehn | clarkb: quickq: with the cirros-0.3.1-x86_64 images, when trying to ssh into them what is username/password? | 21:00 |
fungi | dkehn: clarkb is out to lunch, but the internets tell me that you can log in as the "cirros" user with a password of "cubswin" | 21:02 |
fungi | someone is obviously a chicagoan | 21:02 |
dkehn | fungi: thxs | 21:02 |
fungi | np | 21:03 |
*** Ryan_Lane has joined #openstack-infra | 21:04 | |
*** jcooley_ has joined #openstack-infra | 21:05 | |
*** AaronGr is now known as AaronGr_afk | 21:13 | |
*** mfer_ has joined #openstack-infra | 21:15 | |
*** mfer has quit IRC | 21:16 | |
*** mfer_ has quit IRC | 21:16 | |
*** mfer has joined #openstack-infra | 21:16 | |
*** mfer has quit IRC | 21:17 | |
*** mfer has joined #openstack-infra | 21:17 | |
*** ArxCruz has joined #openstack-infra | 21:18 | |
*** mfer has quit IRC | 21:19 | |
*** zehicle_at_dell has joined #openstack-infra | 21:20 | |
*** mfer has joined #openstack-infra | 21:20 | |
*** smarcet has left #openstack-infra | 21:20 | |
*** mfer has quit IRC | 21:20 | |
*** mfer has joined #openstack-infra | 21:21 | |
clarkb | I am back | 21:21 |
fungi | clarkb: i hacked up a copy of trigger-job.py in my homedir on jenkins-dev and tried to use it to inject the parameters for https://jenkins01.openstack.org/job/gate-tempest-dsvm-full/2194/parameters/ (third window of the screen session there) but no luck, just sits and no slave gets assigned. what bits may i be missing? | 21:22 |
*** mfer has quit IRC | 21:22 | |
*** mfer has joined #openstack-infra | 21:23 | |
clarkb | fungi: I don't think jenkins-dev knows about that job | 21:23 |
fungi | oh, hurr | 21:23 |
clarkb | https://jenkins-dev.openstack.org/job/gate-tempest-devstack-vm-full/ is the job it knows about | 21:23 |
fungi | yeah | 21:24 |
fungi | i guess we need to rerun jjb on it? | 21:24 |
clarkb | fungi: maybe | 21:25 |
fungi | or i can just sub out the job name | 21:25 |
fungi | trying that first | 21:25 |
clarkb | ok | 21:25 |
fungi | aha, node labels | 21:27 |
*** syerrapragada has joined #openstack-infra | 21:28 | |
*** changbl_ has quit IRC | 21:29 | |
*** changbl has joined #openstack-infra | 21:29 | |
*** ^demon|away is now known as ^d | 21:29 | |
dkranz | clarkb: Did you see my comments above? | 21:30 |
*** alcabrera has quit IRC | 21:30 | |
fungi | clarkb: well, no dice. i changed that job to look for devstack-precise (which matches the label on those nodes) rather than dev-devstack-precise, then retried to trigger the job, but still not much going on | 21:31 |
*** gyee has joined #openstack-infra | 21:31 | |
*** ArxCruz has quit IRC | 21:31 | |
fungi | i wonder if the parameter list for the job needs to match the parameters i'm passing with the script now :/ | 21:32 |
*** vkozhukalov has quit IRC | 21:33 | |
clarkb | oh could be | 21:33 |
clarkb | fungi: also is gearman hooked up properly/ | 21:34 |
fungi | ooh, good question | 21:34 |
* fungi checks the plugin | 21:34 | |
*** jasond has joined #openstack-infra | 21:35 | |
fungi | installed, though a couple of revs behind | 21:35 |
jasond | is something wrong with the gate jobs? this seems to be stuck https://review.openstack.org/#/c/59851/ | 21:35 |
fungi | clarkb: we should probably update that anyway from a proper new-jenkins testing perspective | 21:36 |
*** esker has quit IRC | 21:37 | |
fungi | jasond: stuck how? i see it being tested in the gate pipeline on http://status.openstack.org/zuul/ | 21:37 |
clarkb | fungi: ++ | 21:37 |
fungi | clarkb: updating it now | 21:38 |
jasond | fungi: it still says "Need Verified" after a reverify 5 hours ago. so it's working as expected? | 21:39 |
fungi | jasond: yes, that means it's in the gate for testing. there are 17 changes still ahead of it by my count | 21:41 |
*** jcooley_ has quit IRC | 21:41 | |
jasond | fungi: oh ok. thanks for checking | 21:41 |
fungi | depending on how many of those fail, could still be a while | 21:41 |
fungi | sdague: the theory that https://review.openstack.org/61428 would address ssh timeouts seems to be debunked. after being promoted to the head of the gate it failed on gate-tempest-dsvm-neutron on bug 1253896 | 21:44 |
uvirtbot | Launchpad bug 1253896 in tempest "Attempts to verify guests are running via SSH fails. SSH connection to guest does not work." [Critical,Confirmed] https://launchpad.net/bugs/1253896 | 21:44 |
jasond | fungi: i noticed that jenkins' vote has been removed since the last reverify. do i need to recheck again? | 21:44 |
fungi | jasond: that gets removed automatically, until gate testing concludes | 21:44 |
fungi | then it will either get a green checkmark or a red x in that column | 21:45 |
jasond | fungi: okay, thanks | 21:45 |
openstackgerrit | Elizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add 2 new ci publication branches to gerritbot https://review.openstack.org/62095 | 21:46 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Process logs with CRM114 https://review.openstack.org/62096 | 21:47 |
jeblair | clarkb, fungi, mordred: ^ | 21:47 |
anteaya | fungi: markmcclain has another candidate | 21:48 |
jeblair | clarkb, fungi, mordred: crm114 is a fun language. :) | 21:48 |
anteaya | fungi: he is in a meeting right now and then will address it | 21:49 |
fungi | jeblair: i expect to set aside some time this weekend to revel in it | 21:49 |
fungi | anteaya: thanks for the update. i was mainly just passing along the result | 21:49 |
anteaya | yeah | 21:49 |
*** AaronGr_afk is now known as AaronGr | 21:49 | |
jeblair | "Because the commonest use of LIAF is in iteration, LIAF means Loop Iterate Awaiting Failure. If that's too hard to remember, just pretend that LIAF is FAIL spelled backwards." | 21:49 |
anteaya | you are correct if it failed on the bug, it is highly unlikely it is the fix for it | 21:50 |
fungi | heh | 21:50 |
clarkb | jeblair: is that from a how to doc? | 21:50 |
*** harlowja has quit IRC | 21:50 | |
clarkb | fungi: where are you running the gearman client? | 21:50 |
fungi | clarkb: locally on jenkins-dev... should i not? | 21:51 |
clarkb | fungi: I just did a netstat -ln and don't see a port 4730 listening. is zuul-dev stilla thing I bet that is where we need to run it | 21:51 |
fungi | clarkb: should be on 127.0.0.1 | 21:51 |
clarkb | fungi: right I think zuul-dev is running the gearman server that jenkins-dev is connected to | 21:52 |
fungi | clarkb: but was just going to surmise we might need one. i believe the jenkins-gearman plugin is going to refuse to activate if it can't connect to a gearman server | 21:52 |
jeblair | clarkb: it's from a 283 page non-free book. :( | 21:52 |
*** jcooley_ has joined #openstack-infra | 21:52 | |
clarkb | fungi: yup looks like zuul-dev. I would give your command a shot there | 21:52 |
fungi | clarkb: aha. zuul-dev *does* exist. i'll try there | 21:52 |
fungi | i just found it as well | 21:53 |
clarkb | jeblair: you'll just have to explain everything then :) | 21:53 |
*** SergeyLukjanov_ has quit IRC | 21:53 | |
jeblair | clarkb: it's distributed with the project. i dunno what the licensing deal is with the book. fortunately, the software is clear. ;) | 21:54 |
*** vkozhukalov has joined #openstack-infra | 21:54 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Process logs with CRM114 https://review.openstack.org/62096 | 21:54 |
jeblair | requisite pep8 fix ^ | 21:55 |
clarkb | jeblair: oh I see, the book is available just not free | 21:55 |
fungi | clarkb: oh, after the jenkins-dev restart, nodepool deleted those slaves so it'll be a bit before new ones are enrolled | 21:56 |
*** jcooley_ has quit IRC | 21:57 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Process logs with CRM114 https://review.openstack.org/62096 | 21:57 |
clarkb | fungi: was it supposed to delete them? | 21:57 |
fungi | clarkb: dunno, but the age on them is about right | 21:57 |
*** CaptTofu has joined #openstack-infra | 21:58 | |
clarkb | fungi: that seems odd to me, but can probably be ignored for now | 21:58 |
fungi | aha, i think it may be having trouble reconnecting to the gearman plugin on jenkins-dev | 21:58 |
anteaya | markmcclain feels that this patch: https://review.openstack.org/#/c/62098/ I reversion of a rpc patch may address bug 1253896 | 21:58 |
uvirtbot | Launchpad bug 1253896 in tempest "Attempts to verify guests are running via SSH fails. SSH connection to guest does not work." [Critical,Confirmed] https://launchpad.net/bugs/1253896 | 21:58 |
anteaya | any chance of it getting a priority in the check queue? | 21:59 |
fungi | clarkb: anyway, i'm being paged to go out to dinner now, so i'll have to continue this once i return | 21:59 |
fungi | but i'll restart the nodepoold first | 21:59 |
*** mfer has quit IRC | 21:59 | |
*** harlowja has joined #openstack-infra | 22:00 | |
openstackgerrit | Elizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add 2 new ci publication branches to gerritbot https://review.openstack.org/62095 | 22:00 |
pleia2 | sneaky whitespace | 22:00 |
*** sarob has joined #openstack-infra | 22:00 | |
clarkb | fungi: ok, I can try triggering the job by hand over on zuul-dev | 22:01 |
*** esker has joined #openstack-infra | 22:01 | |
*** jcooley_ has joined #openstack-infra | 22:01 | |
*** rcarrillocruz has quit IRC | 22:02 | |
fungi | clarkb: what i was *going* to run is... (in ~/zuul with . venv/bin/activate) ./tools/trigger-job.py --job gate-tempest-devstack-vm-full --project openstack/nova --pipeline gate --newrev 1436c1707a127dc82136b1046934c8a56b558a0a --refname refs/zuul/master/Z2d5c1f5108fa490a8971e381fd423a09 --logpath 28/61428/2/gate/gate-tempest-devstack-vm-full/7e1d10e,2 | 22:02 |
fungi | note that the zuul on zuul-dev is too old to have trigger-job.py | 22:02 |
fungi | so it may also be too old to support it, not sure yet | 22:02 |
fungi | oh, and i just realized i didn't modify it to support all the additional parameters a gate job would want like i did the copy i was initially testing on jenkins-dev | 22:03 |
*** praneshp has quit IRC | 22:03 | |
fungi | but if you want it, it's in the same place in my homedir there | 22:03 |
jeblair | fungi: trigger-job doesn't affect zuul, it goes straight to the worker | 22:03 |
clarkb | fungi: ok thanks | 22:04 |
*** esker has quit IRC | 22:04 | |
fungi | jeblair: right, okay then it should be fine | 22:04 |
* fungi vanishes | 22:04 | |
openstackgerrit | A change was merged to openstack-infra/statusbot: Set world-readable permissions on alert file https://review.openstack.org/61588 | 22:05 |
jeblair | clarkb: let me know if you need anything | 22:06 |
clarkb | jeblair: will do, btw looking at the crm114 change I like how simple the actual mechanics of it are. Next step is CRM114 as a service? :) | 22:07 |
*** resker has joined #openstack-infra | 22:07 | |
clarkb | currently waiting for nodepool to spin up two slaves that I can trigger jobs against | 22:07 |
clarkb | it is running a job \o/ | 22:08 |
clarkb | I didn't have to do anything | 22:08 |
jeblair | clarkb: heh :) note there's a level there too -- we can disable the filter by removing it from the config yaml | 22:08 |
jeblair | s/level/lever/ | 22:08 |
*** praneshp has joined #openstack-infra | 22:08 | |
clarkb | slave 19 is running a devstack job | 22:09 |
hemanth_ | Hi, can anyone help me with this? http://logs.openstack.org/14/59814/8/check/gate-tempest-dsvm-neutron-large-ops/ee6bfe0/console.html | 22:09 |
hemanth_ | not really sure what that means | 22:09 |
clarkb | hemanth_: http://logs.openstack.org/14/59814/8/check/gate-tempest-dsvm-neutron-large-ops/ee6bfe0/logs/screen-g-api.txt.gz notice in the console log it was attempting to start glance when it failed | 22:10 |
*** thomasem has quit IRC | 22:11 | |
hemanth_ | clarkb: oops, thanks so much for pointing it! | 22:13 |
clarkb | jeblair: https://jenkins-dev.openstack.org/job/gate-tempest-devstack-vm-full/7896/console do we actually expect those jobs to run successfully? I think it may be too old | 22:14 |
clarkb | jeblair: but the job did run and nodepool put the slave into the delete state | 22:14 |
clarkb | and removed it from jenkins | 22:14 |
*** openstackstatus has quit IRC | 22:14 | |
*** openstackstatus_ has joined #openstack-infra | 22:14 | |
clarkb | slave is now completely gone from jenkins-dev | 22:14 |
*** openstackstatus_ is now known as openstackstatus | 22:14 | |
*** resker has quit IRC | 22:14 | |
jeblair | clarkb: yeah, i think that ip might be an old machine i was running | 22:15 |
jeblair | clarkb: long gone. so yeah, i wouldn't worry about the jobs themselves, just the mechanics around them. | 22:15 |
clarkb | jeblair: yeah the SCP thing doesn't bother me | 22:15 |
*** prad has quit IRC | 22:15 | |
clarkb | devstack stopping so quickly does bother me a bit | 22:15 |
*** openstackstatus has quit IRC | 22:15 | |
*** openstackstatus_ has joined #openstack-infra | 22:15 | |
*** openstackstatus_ is now known as openstackstatus | 22:16 | |
clarkb | jeblair: anything else you think we should look at before planning some rolling upgrades? | 22:16 |
jeblair | clarkb: it probably tried to fetch a zuul ref from prod | 22:16 |
jeblair | clarkb: (new ZUUL_URL feature could help with that) | 22:16 |
clarkb | old zuul refs maybe | 22:16 |
*** harlowja has quit IRC | 22:16 | |
clarkb | oh from review.o.o? | 22:16 |
jeblair | clarkb: no i mean i think the jobs are the same jobs as in production, so it tried to fetch from zuul.o.o not zuul-dev.o.o | 22:16 |
clarkb | oh right | 22:17 |
*** openstackstatus has quit IRC | 22:17 | |
*** openstackstatus_ has joined #openstack-infra | 22:17 | |
*** openstackstatus_ is now known as openstackstatus | 22:17 | |
jeblair | i'll go fix statusbot | 22:17 |
*** openstackstatus has quit IRC | 22:17 | |
*** openstackstatus has joined #openstack-infra | 22:18 | |
*** openstackstatus has quit IRC | 22:18 | |
zaro | clarkb: https://issues.jenkins-ci.org/browse/JENKINS-21006 | 22:18 |
*** openstackstatus has joined #openstack-infra | 22:18 | |
jeblair | zaro: neat, thanks | 22:19 |
jeblair | clarkb: do a jjb run? delete at least one job from the cache so it does something.. | 22:20 |
jeblair | clarkb: other than that, the only thing i would think is burn-in -- do we want to leave it running for a few days to see if leakes or explodes with nodepool annoying it all the time? | 22:22 |
clarkb | jeblair: we can | 22:22 |
clarkb | jeblair: looks like you ran JJB by hand on jenkins-dev. doing that now | 22:23 |
*** harlowja has joined #openstack-infra | 22:24 | |
clarkb | jeblair: we don't have jjb running periodically out of a system location there, so I won't worry about cache and just apply all the jobs | 22:24 |
jeblair | k | 22:25 |
*** jcooley_ has quit IRC | 22:25 | |
*** esker has joined #openstack-infra | 22:25 | |
jeblair | clarkb: if you want to start with the rolling upgrades without burning in on -dev, that's fine. we do have 2 masters. | 22:25 |
*** AlexF has joined #openstack-infra | 22:25 | |
*** jerryz has quit IRC | 22:25 | |
clarkb | jeblair: part of me wants to, the other part of me realizes the weekend is near | 22:25 |
*** harlowja has quit IRC | 22:27 | |
openstackgerrit | A change was merged to openstack-infra/config: Fix serving alert json file on eavesdrop https://review.openstack.org/61593 | 22:28 |
*** harlowja has joined #openstack-infra | 22:28 | |
openstackgerrit | A change was merged to openstack-infra/config: Don't re-exec in check-dg-tempest-dsvm-full https://review.openstack.org/61569 | 22:29 |
*** resker has joined #openstack-infra | 22:32 | |
*** esker_ has joined #openstack-infra | 22:33 | |
*** esker has quit IRC | 22:34 | |
clarkb | JJB is creating a bunch of jobs, seems to be happy | 22:35 |
clarkb | jeblair: maybe we upgrade jenkins.o.o today then do 01 and 02 monday? | 22:35 |
*** vkozhukalov has quit IRC | 22:35 | |
clarkb | that will give us a bit more burn in on less active machines | 22:35 |
*** denis_makogon_ has joined #openstack-infra | 22:35 | |
jeblair | clarkb: i'd rather do jenkins.o.o last since it's not HA | 22:35 |
clarkb | oh good point | 22:35 |
clarkb | reverting is relatively easy, I am very tempted to go ahead and try 01 | 22:36 |
*** resker has quit IRC | 22:37 | |
*** AlexF has quit IRC | 22:37 | |
*** dangers is now known as danger_fo_away | 22:39 | |
*** AlexF has joined #openstack-infra | 22:40 | |
*** weshay has quit IRC | 22:40 | |
*** jasond has quit IRC | 22:42 | |
*** paul-- has joined #openstack-infra | 22:42 | |
*** ryanpetrello has quit IRC | 22:44 | |
clarkb | jeblair: JJB seems to have been fine, no apparent errors | 22:48 |
jeblair | clarkb: cool | 22:48 |
clarkb | jeblair: how do you feel about upgrading 01 or 02 today? My only concern is I will be in CA early next week and may not have as much time to babysit then | 22:49 |
*** CaptTofu has quit IRC | 22:49 | |
*** ^d has quit IRC | 22:50 | |
*** CaptTofu has joined #openstack-infra | 22:50 | |
*** esker_ has quit IRC | 22:50 | |
jeblair | clarkb: wfm | 22:50 |
clarkb | ok putting 01 in shutdown mode now | 22:52 |
*** rcleere has quit IRC | 22:54 | |
*** esker has joined #openstack-infra | 22:57 | |
*** dkliban has quit IRC | 22:58 | |
jeblair | clarkb: i'll be afk for a while, back in a bit | 22:58 |
*** bpokorny has quit IRC | 22:58 | |
clarkb | jeblair: ok ping me when you are back, hopefully 01 will be quiet by then | 23:01 |
*** mgagne has quit IRC | 23:03 | |
*** sarob has quit IRC | 23:07 | |
*** sarob has joined #openstack-infra | 23:08 | |
*** sarob has quit IRC | 23:09 | |
*** sarob has joined #openstack-infra | 23:09 | |
*** datsun180b has quit IRC | 23:10 | |
*** sarob has quit IRC | 23:11 | |
*** sarob has joined #openstack-infra | 23:11 | |
*** oubiwan__ has quit IRC | 23:12 | |
*** sarob has quit IRC | 23:12 | |
*** gyee has quit IRC | 23:13 | |
*** rcarrillocruz1 has joined #openstack-infra | 23:13 | |
*** sarob has joined #openstack-infra | 23:13 | |
*** rcarrillocruz2 has joined #openstack-infra | 23:14 | |
*** sarob has quit IRC | 23:15 | |
*** rcarrillocruz2 has quit IRC | 23:15 | |
*** sarob has joined #openstack-infra | 23:15 | |
*** AlexF has quit IRC | 23:16 | |
*** sarob has quit IRC | 23:16 | |
*** rcarrillocruz1 has quit IRC | 23:17 | |
*** sarob has joined #openstack-infra | 23:18 | |
*** rnirmal has quit IRC | 23:20 | |
*** fbo is now known as fbo_away | 23:20 | |
*** sarob has quit IRC | 23:21 | |
nikhil__ | hi | 23:21 |
clarkb | hello | 23:21 |
*** sarob has joined #openstack-infra | 23:21 | |
nikhil__ | hey clarkb | 23:21 |
nikhil__ | can you please help me figure out | 23:21 |
nikhil__ | if there's a typo in https://jenkins01.openstack.org/job/check-grenade-dsvm/2036/console ? | 23:22 |
nikhil__ | 2013-12-13 22:51:35.733 | [ERROR] ./grenade.sh:263 Failure in upgrade-glancwe | 23:22 |
clarkb | looks like it | 23:23 |
nikhil__ | that is one of the jenkins runs | 23:23 |
clarkb | git grep glancwe in the grenade repo will show you where | 23:23 |
clarkb | right, but the typo is in grenade | 23:23 |
nikhil__ | oh, is that in the openstack-infra project? | 23:23 |
*** sarob has quit IRC | 23:24 | |
clarkb | no | 23:24 |
*** sarob_ has joined #openstack-infra | 23:24 | |
clarkb | it is an openstack-dev project like devstack | 23:24 |
nikhil__ | oh | 23:24 |
anteaya | nikhil__: http://git.openstack.org/cgit/openstack-dev/grenade/tree/ | 23:24 |
clarkb | jeblair: fungi: jenkins01 will be idle any minute now. Let me know when at least one of you is around. Though I may go ahead and upgrade jenkins01 if I don't hear from you guys in a bit just for the sake of time | 23:25 |
nikhil__ | thanks clarkb anteaya , checking it out now | 23:25 |
*** sarob_ has quit IRC | 23:27 | |
*** blamar has quit IRC | 23:28 | |
*** sarob has joined #openstack-infra | 23:29 | |
*** sarob has quit IRC | 23:30 | |
*** sarob has joined #openstack-infra | 23:31 | |
reed | sarob, to create a new list https://wiki.openstack.org/wiki/Community#Mailing_lists_in_local_languages | 23:32 |
*** sarob has quit IRC | 23:32 | |
jeblair | clarkb: re | 23:33 |
clarkb | jeblair: there is one job on 01 currently running on hpcloud region b. I think it has a couple more minutes | 23:34 |
*** sarob has joined #openstack-infra | 23:34 | |
*** sarob has quit IRC | 23:35 | |
*** praneshp has quit IRC | 23:36 | |
*** sarob has joined #openstack-infra | 23:36 | |
fungi | okay, back... checking scrollback to see where we are | 23:37 |
clarkb | fungi: jenkins-dev seemed happy with nodepool and jjb so I have put jenkins01 in shutdown mode, waiting on one job there before upgrading | 23:37 |
clarkb | fungi: I will be in CA early next week so figured doing this now was beneficial despite being friday | 23:38 |
fungi | yep, great! | 23:38 |
fungi | so once jenkins-dev's nodepool built new slaves it picked up on the corrected node labels i guess? | 23:38 |
clarkb | fungi: I guess, because the jobs started running | 23:38 |
fungi | wondering whether the jenkins-gearman plugin uprgade had anything to do with tat | 23:38 |
fungi | that | 23:38 |
*** praneshp has joined #openstack-infra | 23:39 | |
clarkb | possibly, maybe it couldn't handle the job data being sent previously | 23:39 |
fungi | so you didn't actually have to manually trigger any jobs at all i guess. too awesome | 23:39 |
fungi | interestingly, jenkins-dev has one devstack slave which is already marked offline but is running a tempest job. slightly odd... | 23:40 |
clarkb | fungi: the nodes get marked offline when they start the jobs | 23:40 |
*** sarob has quit IRC | 23:41 | |
fungi | however, it also thinks that tempest job should only take a total of ~2 minutes | 23:41 |
*** sarob has joined #openstack-infra | 23:41 | |
clarkb | fungi: yeah the job is failing, jeblair thinks it is trying to clone zuul refs from zuul.o.o and not zuul-dev | 23:41 |
fungi | ahh, upload timeouts | 23:41 |
clarkb | but the mechanics of add node, delete node, seem fine | 23:41 |
clarkb | fungi: upload timeouts are because jeblair killed the scp endpoint | 23:41 |
fungi | it probably is trying to clone from zuul.o.o | 23:42 |
*** sarob has quit IRC | 23:42 | |
fungi | zuul-dev has too old of a zuul to pass the ZUUL_URLparameter | 23:42 |
clarkb | this regionb slave is taking forver | 23:43 |
*** sarob has joined #openstack-infra | 23:43 | |
clarkb | almost tempted to kill a job and leave a comment on the change apologizing | 23:44 |
*** sarob has quit IRC | 23:45 | |
fungi | clarkb: assuming it's https://jenkins01.openstack.org/job/check-tempest-dsvm-full/2367/ the change already failed another dsvm job anyway | 23:45 |
clarkb | thats the one | 23:45 |
clarkb | ok I will just manually kill it | 23:45 |
*** sarob has joined #openstack-infra | 23:45 | |
clarkb | fungi: want to leave the comment? | 23:45 |
fungi | ot failed the postgres-full so it's getting a -1 from check regardless | 23:45 |
fungi | sure | 23:45 |
fungi | nova devs have grown a thick skin, i think ;) | 23:46 |
clarkb | going to give nodepool a minute or so to try and cleanup that node | 23:46 |
fungi | oh, and it's rustlebee's change anyway ;) | 23:46 |
*** sarob has quit IRC | 23:46 | |
clarkb | let me know when you are ready for me to stop jenkins, do the upgrade and start it again | 23:47 |
fungi | i should be nice to him, he did approve vulnerability fixes for me yesterday, after all | 23:47 |
fungi | clarkb: go for it | 23:47 |
clarkb | doing it now | 23:47 |
clarkb | it is starting | 23:48 |
*** sarob has joined #openstack-infra | 23:48 | |
*** sarob has quit IRC | 23:49 | |
*** sarob has joined #openstack-infra | 23:50 | |
clarkb | according to zuul it is running jobs, still waiting on the guithough | 23:50 |
sdague | hmmm... it looks like the only we are finding new errors in logs is in grenade, which wasn't quite the intent of that job. | 23:50 |
*** flaper87 is now known as flaper87|afk | 23:50 | |
*** denis_makogon_ has quit IRC | 23:51 | |
sdague | I think it might be worth turning that off - https://review.openstack.org/#/c/62107/ | 23:51 |
*** esker has quit IRC | 23:52 | |
fungi | sdague: makes sense | 23:52 |
*** esker has joined #openstack-infra | 23:53 | |
fungi | error checks against stable, particularly, are going to be myriad until the icehouse release, i expect | 23:53 |
fungi | clarkb: jenkins01 looks happytimes | 23:54 |
jeblair | ooh neat you can collapse the exceutor status box | 23:54 |
clarkb | fungi: yup seems to be doing its job | 23:54 |
*** sarob has quit IRC | 23:54 | |
jeblair | "master + 115 computers (7 of 8 executors)" | 23:54 |
jeblair | no idea what "7 of 8 executors" means. | 23:55 |
* fungi nods. +/- glyphs | 23:55 | |
fungi | dunno, but it's fancified | 23:55 |
clarkb | jeblair: now, do you want to let 01 burn in? | 23:55 |
*** sarob has joined #openstack-infra | 23:56 | |
jeblair | clarkb: yeah, i kinda do. see what it looks like after a few hours/days of thrashing | 23:56 |
clarkb | wfm | 23:56 |
fungi | also, once we're done upgrading 01 and 02 we should not forget poor jenkins.o.o | 23:56 |
fungi | but the weekend (or at least a night of churning through the gate) should give us some idea | 23:56 |
jeblair | clarkb: hopefully if something does go wrong, 02 will continue to keep things going | 23:56 |
clarkb | I will do my best to make time to check in and help upgrade the others on Monday | 23:57 |
*** esker has quit IRC | 23:57 | |
fungi | clarkb: where in ca (also, is that the state code or country code)? | 23:57 |
clarkb | fungi: the state | 23:58 |
fungi | i need to know whether to send sheriffs or mounties | 23:58 |
clarkb | I will be in sunnyvale | 23:58 |
jeblair | there are only 536 threads on 01 compared to 1,869 on 02 | 23:58 |
clarkb | jeblair: nice | 23:58 |
fungi | oh, sounds work-related. apologies | 23:58 |
clarkb | fungi: it is! but it is work related in a good way | 23:58 |
fungi | get zaro a fresh laptop as a souvenir | 23:58 |
clarkb | Should have lots of time to sit with AaronGr and go over all the things | 23:59 |
fungi | nice | 23:59 |
AaronGr | clarkb: exciting. | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!