jeblair | because of the rarity, i think filing bugs is ok, even if we don't end up doing anything with some of them because they are not ultimately actionable. | 00:00 |
---|---|---|
jeblair | it fits into the data collection strategy we have otherwise, so does not need an exception to the processes we're discussing | 00:01 |
clarkb | mordred: are we using wheels yet or are they just in the mirror so that we are ready for them? | 00:01 |
clarkb | mordred: the libvirt thing mikal is looking at started on the 15th and that appears to be when the mirror started doing wheels | 00:01 |
openstackgerrit | A change was merged to openstack-dev/hacking: Add a check for newline after docstring summary https://review.openstack.org/55644 | 00:04 |
fungi | another fun issue causing gate resets... nettron py26 unit tests taking over an hour https://jenkins01.openstack.org/job/gate-neutron-python26/3161/console | 00:07 |
fungi | s/nettron/neutron/ | 00:07 |
fungi | robotron 2084 | 00:07 |
fungi | looks like subunit2html.py got killed after 12 minutes of processing the subunit log | 00:09 |
clarkb | fungi: :/ we made that go faster but I think their log files are too gigantic | 00:09 |
*** CaptTofu has joined #openstack-infra | 00:11 | |
*** zaro0508 has joined #openstack-infra | 00:14 | |
*** vipul-away is now known as vipul | 00:16 | |
*** xeyed4good has quit IRC | 00:16 | |
*** harlowja has quit IRC | 00:16 | |
*** thomasem has joined #openstack-infra | 00:17 | |
*** xeyed4good has joined #openstack-infra | 00:17 | |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Add query for bug 1252514 https://review.openstack.org/57070 | 00:17 |
uvirtbot | Launchpad bug 1252514 in swift "glance doesn't recover if Swift returns an error" [Undecided,New] https://launchpad.net/bugs/1252514 | 00:17 |
*** zaro0508 has quit IRC | 00:20 | |
*** zaro0508 has joined #openstack-infra | 00:20 | |
*** zaro0508 has joined #openstack-infra | 00:21 | |
*** dkliban has joined #openstack-infra | 00:21 | |
*** wenlock has quit IRC | 00:22 | |
*** hogepodge has quit IRC | 00:23 | |
*** thomasem has quit IRC | 00:26 | |
*** dkliban has quit IRC | 00:27 | |
*** matsuhashi has joined #openstack-infra | 00:29 | |
*** senk has joined #openstack-infra | 00:30 | |
*** pcrews has quit IRC | 00:31 | |
*** dkranz has joined #openstack-infra | 00:34 | |
*** loq_mac has joined #openstack-infra | 00:35 | |
*** mrodden has quit IRC | 00:36 | |
*** loq_mac has quit IRC | 00:37 | |
*** dcramer_ has joined #openstack-infra | 00:39 | |
*** MarkAtwood has quit IRC | 00:41 | |
*** CaptTofu has quit IRC | 00:42 | |
*** CaptTofu has joined #openstack-infra | 00:44 | |
*** matsuhashi has quit IRC | 00:48 | |
*** matsuhashi has joined #openstack-infra | 00:49 | |
*** mrodden has joined #openstack-infra | 00:50 | |
*** matsuhashi has quit IRC | 00:53 | |
*** alchen99 has quit IRC | 00:54 | |
*** jcooley_ has quit IRC | 00:55 | |
clarkb | I am going to delete the 7 nodes I held with nodepool that were not failures now | 00:55 |
*** jcooley_ has joined #openstack-infra | 00:55 | |
*** alchen99 has joined #openstack-infra | 00:57 | |
*** alchen99 has quit IRC | 00:59 | |
*** alexpilotti has quit IRC | 00:59 | |
*** Ryan_Lane has quit IRC | 00:59 | |
*** Ryan_Lane has joined #openstack-infra | 00:59 | |
*** alchen99 has joined #openstack-infra | 00:59 | |
*** jcooley_ has quit IRC | 01:00 | |
*** reed has quit IRC | 01:02 | |
*** senk has quit IRC | 01:02 | |
*** oubiwann has joined #openstack-infra | 01:07 | |
*** matsuhashi has joined #openstack-infra | 01:08 | |
*** dcramer_ has quit IRC | 01:08 | |
*** nosnos has joined #openstack-infra | 01:09 | |
fungi | clarkb: jeblair: we seem to have grown a crust of ~100 nodepool nodes perpetually in a deleted state since earlier today. i'm guessing the periodic cleanup thread is maybe deadlocked again? should i get a thread dump and restart nodepool? | 01:18 |
jeblair | fungi: oh excellent; please do | 01:18 |
clarkb | fungi: ++ | 01:18 |
fungi | on it | 01:19 |
*** dcramer_ has joined #openstack-infra | 01:20 | |
*** sarob has quit IRC | 01:22 | |
fungi | clarkb: jeblair: trimming the stack dump out of the debug log, it's still nearly 5k lines... find it at nodepool:~fungi/stack_dump.log | 01:28 |
fungi | restarting nodepool now | 01:28 |
*** svarnau has quit IRC | 01:29 | |
*** wenlock has joined #openstack-infra | 01:29 | |
fungi | self.periodicCleanup(session) is on line 1138 | 01:30 |
*** senk has joined #openstack-infra | 01:31 | |
fungi | am i reading that correctly that it's blocked on ListFloatingIPsTask()? | 01:31 |
*** markwash has quit IRC | 01:31 | |
*** senk has joined #openstack-infra | 01:32 | |
jeblair | fungi: what's the name of the thread? | 01:32 |
fungi | though looks like there are other threads also looping in a wait | 01:32 |
fungi | Thread: Thread-12214 (140153506952960) | 01:33 |
jeblair | fungi: that appears to be the case | 01:33 |
fungi | i see about half a dozen threads that might be similarly blocking on that call | 01:34 |
jeblair | fungi: 17 | 01:34 |
*** matsuhashi has quit IRC | 01:34 | |
fungi | yeah, i lowballed. grep -c next time ;) | 01:34 |
*** matsuhashi has joined #openstack-infra | 01:35 | |
*** wenlock has quit IRC | 01:35 | |
fungi | so i guess there are clearly times where that does not return. should we be wrapping that call in a timeout? | 01:35 |
*** jamesmcarthur has joined #openstack-infra | 01:36 | |
*** matsuhashi has quit IRC | 01:40 | |
*** matsuhashi has joined #openstack-infra | 01:43 | |
jeblair | fungi: i don't think it was hung | 01:43 |
jeblair | fungi: i think the problem is that periodic cleanup exceptions on a node stop the thread | 01:43 |
jeblair | fungi: grep "periodic cleanup" /var/log/nodepool/debug.log | 01:44 |
*** noorul has left #openstack-infra | 01:45 | |
fungi | InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: UPDATE statement on table 'node' expected to update 1 row(s); 0 were matched. | 01:46 |
fungi | nice | 01:46 |
clarkb | so the cleanup thread dies? (eating dinner so mostly afk) | 01:46 |
jeblair | clarkb: exits | 01:46 |
jeblair | fungi: i think it's racing with the normal delete threads | 01:47 |
*** alchen99 has quit IRC | 01:49 | |
jeblair | periodic should probably only delete a node in the delete state if it's been in that state for at least 15 mins | 01:49 |
fungi | fair, it should realistically never take that long for the normal cleanup to run i guess | 01:50 |
jeblair | fungi: it times out after 10 mins | 01:50 |
jeblair | we might actually just want to do that first before changing the exception handler, to flush out any similar bugs | 01:51 |
*** harlowja has joined #openstack-infra | 01:51 | |
jeblair | anyway, me -> dinner | 01:52 |
fungi | ahh, so it does... for count in iterate_timeout(600, "waiting for server %s deletion" | 01:52 |
*** jcooley_ has joined #openstack-infra | 01:54 | |
*** bingbu has joined #openstack-infra | 01:55 | |
*** jaypipes has quit IRC | 01:58 | |
*** sjing has joined #openstack-infra | 01:58 | |
*** nati_ueno has joined #openstack-infra | 01:59 | |
*** oubiwann has quit IRC | 02:03 | |
lifeless | ttx: https://wiki.openstack.org/wiki/Governance/Foundation/TechnicalCommittee - 'spring' and 'fall' are meaningless terms. Can we change that to specify calendar months? Or hemispheres? | 02:03 |
*** yaguang has joined #openstack-infra | 02:04 | |
*** alexpilotti has joined #openstack-infra | 02:06 | |
*** sjing has quit IRC | 02:07 | |
*** changbl has joined #openstack-infra | 02:09 | |
*** sjing has joined #openstack-infra | 02:09 | |
*** dolphm has joined #openstack-infra | 02:11 | |
*** gyee has quit IRC | 02:13 | |
*** metabro has quit IRC | 02:14 | |
*** xeyed4good has quit IRC | 02:16 | |
*** ilyashakhat has quit IRC | 02:17 | |
*** changbl has quit IRC | 02:19 | |
*** pcrews has joined #openstack-infra | 02:22 | |
*** ogelbukh has quit IRC | 02:26 | |
*** yamahata_ has joined #openstack-infra | 02:28 | |
*** Ryan_Lane has quit IRC | 02:28 | |
*** sarob has joined #openstack-infra | 02:29 | |
*** david-lyle has quit IRC | 02:30 | |
*** david-lyle has joined #openstack-infra | 02:31 | |
*** yamahata_ has quit IRC | 02:33 | |
*** mrodden has quit IRC | 02:34 | |
*** senk has quit IRC | 02:38 | |
*** sarob has quit IRC | 02:40 | |
*** sarob has joined #openstack-infra | 02:42 | |
*** jerryz has quit IRC | 02:42 | |
*** sarob has quit IRC | 02:47 | |
*** sarob has joined #openstack-infra | 02:48 | |
*** yamahata_ has joined #openstack-infra | 02:50 | |
*** mrodden has joined #openstack-infra | 02:50 | |
*** senk has joined #openstack-infra | 02:51 | |
*** sarob has quit IRC | 02:52 | |
*** dcramer_ has quit IRC | 02:54 | |
*** dolphm has quit IRC | 02:59 | |
*** senk has quit IRC | 03:00 | |
*** herndon_ has quit IRC | 03:02 | |
*** changbl has joined #openstack-infra | 03:02 | |
*** melwitt has quit IRC | 03:02 | |
*** xeyed4good has joined #openstack-infra | 03:08 | |
*** dkranz has quit IRC | 03:09 | |
jog0 | holy crap: gate is 108 long | 03:11 |
jog0 | and check is 30 | 03:12 |
*** nati_ueno has quit IRC | 03:14 | |
*** nati_ueno has joined #openstack-infra | 03:16 | |
*** D30 has joined #openstack-infra | 03:17 | |
*** sileht has quit IRC | 03:17 | |
clarkb | jog0: its been that way all day | 03:17 |
*** nati_ueno has quit IRC | 03:20 | |
*** sarob has joined #openstack-infra | 03:21 | |
*** michchap has quit IRC | 03:21 | |
*** michchap has joined #openstack-infra | 03:22 | |
*** DennyZhang has joined #openstack-infra | 03:24 | |
jog0 | clarkb: blames nova console log | 03:27 |
* jog0 blames ^ | 03:27 | |
notmyname | jog0: I don't want to sound pessimistic, but did we have any patches pass jenkins today? | 03:29 |
*** matsuhashi has quit IRC | 03:31 | |
notmyname | 108 jobs in the gate is 16 more than this morning. that's the wrong direction! ;-) | 03:31 |
*** matsuhashi has joined #openstack-infra | 03:31 | |
notmyname | jog0: do we need o stop triggering retries until things settle down? | 03:31 |
*** sjing has quit IRC | 03:32 | |
*** sjing has joined #openstack-infra | 03:33 | |
*** Ryan_Lane has joined #openstack-infra | 03:34 | |
fungi | there were definitely changes making it through. i saw post jobs (coverage, branch-tarball, et cetera) running for them from time to time | 03:34 |
fungi | specifically for projects which are part of the integrated queue | 03:35 |
*** matsuhashi has quit IRC | 03:36 | |
*** fifieldt has joined #openstack-infra | 03:36 | |
notmyname | fungi: should we hold of on doing rechecks? | 03:37 |
fungi | notmyname: i don't know that it would help any more than avoiding uploading or approving changes | 03:39 |
*** DennyZhang has quit IRC | 03:39 | |
fungi | i just hope it gains some ground over the coming hours when activity is lower | 03:39 |
notmyname | fungi: well, it would lower check jobs that simply failed but haven't been fully reviewed or aren't ready to merge yet | 03:39 |
fungi | the check queue isn't really starving the gate | 03:40 |
notmyname | ok | 03:40 |
*** matsuhashi has joined #openstack-infra | 03:41 | |
notmyname | do you know what the baseline was this morning? ie what number will be better or worse in the morning? | 03:41 |
notmyname | *morning == in 10 hours for everyone | 03:41 |
jog0 | notmyname: you should sound pessimistic | 03:42 |
notmyname | ie with 70 jobs in the queue in 10 hours, will that be great or terrifying as to what the gate queue would look like in 24 hours from now? | 03:42 |
fungi | throughput is being mostly limited by the increased nondeterminism in the jobs being run constantly restarting jobs for changes behind them | 03:42 |
notmyname | or do we need 7 jobs in the queue in 10 hours? | 03:42 |
*** dstanek has quit IRC | 03:43 | |
fungi | we had roughly 75 in the gate when i started working around 1400z | 03:43 |
fungi | and we've lost ground by about 30 since | 03:44 |
notmyname | or 50% if you want to be pessimistic like jog0 | 03:44 |
notmyname | I gotta go for the night. I know you all are working hard on it. thanks | 03:45 |
fungi | i suspect that the best hope for speeding it up would be if more development effort could be focused on identifying and fixing the various sources of nondeterminism which have found their way in, rather than on unrelated development efforts | 03:48 |
fungi | since the latter is adding fuel to the fire at the moment | 03:49 |
*** dstanek has joined #openstack-infra | 03:49 | |
*** dstanek has quit IRC | 03:50 | |
*** portante is now known as I-Am-Sam | 03:52 | |
*** CaptTofu has quit IRC | 03:53 | |
*** CaptTofu has joined #openstack-infra | 03:54 | |
*** guohliu has joined #openstack-infra | 03:54 | |
*** jcooley_ has quit IRC | 03:57 | |
*** yamahata_ has quit IRC | 03:58 | |
*** SergeyLukjanov has joined #openstack-infra | 04:01 | |
*** I-Am-Sam is now known as portante | 04:02 | |
*** boris-42 has joined #openstack-infra | 04:03 | |
*** ljjjusti1 has quit IRC | 04:05 | |
*** metabro has joined #openstack-infra | 04:10 | |
*** dstanek has joined #openstack-infra | 04:14 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/pypi-mirror: add an export option https://review.openstack.org/57345 | 04:17 |
clarkb | lifeless we test openstack on libvirt 0.9.8 because cloud archive mongodb is broken (iirc this is why we dont use cloud archive) | 04:18 |
jog0 | notmyname: one of the bugs is swift config in devstack | 04:20 |
openstackgerrit | Khai Do proposed a change to openstack-infra/config: add nodepool to jenkins-dev server https://review.openstack.org/57333 | 04:21 |
jog0 | https://bugs.launchpad.net/bugs/1252514 | 04:21 |
uvirtbot | Launchpad bug 1252514 in swift "glance doesn't recover if Swift returns an error" [Undecided,New] | 04:21 |
*** mgagne has joined #openstack-infra | 04:22 | |
*** mgagne has quit IRC | 04:22 | |
*** mgagne has joined #openstack-infra | 04:22 | |
*** DinaBelova has joined #openstack-infra | 04:23 | |
*** ogelbukh has joined #openstack-infra | 04:24 | |
lifeless | clarkb: is there a bug open for that ? | 04:24 |
portante | jog0 looking | 04:24 |
*** wenlock has joined #openstack-infra | 04:25 | |
jog0 | portante: thanks | 04:25 |
clarkb | lifeless not sure jd_- was dealing with it | 04:26 |
clarkb | *jd__ | 04:26 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/nodepool: Skip periodic cleanup if the node is not stale https://review.openstack.org/57364 | 04:26 |
* portante goes and pulls the glance code ... | 04:26 | |
lifeless | clarkb: because, LTS w/out cloud archive isn't a config I expect to be representative of deployments | 04:27 |
*** DennyZhang has joined #openstack-infra | 04:28 | |
*** mgagne1 has joined #openstack-infra | 04:28 | |
*** mgagne1 has quit IRC | 04:28 | |
*** mgagne1 has joined #openstack-infra | 04:28 | |
clarkb | totally agree, but cloud archive can't do an all in one install (or couldn't) | 04:29 |
lifeless | cause of mongo? | 04:29 |
clarkb | yup | 04:30 |
*** mgagne has quit IRC | 04:31 | |
*** jcooley_ has joined #openstack-infra | 04:33 | |
*** masayukig has joined #openstack-infra | 04:34 | |
*** SergeyLukjanov has quit IRC | 04:37 | |
*** jcooley_ has quit IRC | 04:38 | |
portante | jog0: okay, so it looks like the proxy-server configuration for swift on the devstack is set to 10 seconds, but the object server took around 43 seconcds to create the object | 04:41 |
jog0 | portante: so what is the fix? | 04:41 |
jog0 | do we somehow make the object server faster? or just bump the timeout | 04:42 |
portante | so the tolerances for swift need to be loosened up a bit, it would seem | 04:42 |
jog0 | portante: ohh and awesome, thanks | 04:42 |
jog0 | want to propose a patch to devstack for this? | 04:42 |
portante | point me at a repo and gerrit? | 04:42 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Remove queries for dead bugs https://review.openstack.org/57367 | 04:42 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add doc on queries.yaml https://review.openstack.org/57368 | 04:42 |
portante | never don't that before | 04:42 |
jog0 | portante: http://git.openstack.org/cgit/openstack-dev/devstack/ | 04:43 |
* portante looks | 04:43 | |
jog0 | portante: I think you want to look at lib/swift | 04:43 |
*** dcramer_ has joined #openstack-infra | 04:43 | |
portante | k | 04:44 |
jog0 | and look at the iniset commands | 04:44 |
*** arata has joined #openstack-infra | 04:46 | |
*** sandywalsh has quit IRC | 04:49 | |
*** markwash has joined #openstack-infra | 04:51 | |
*** boris-42 has quit IRC | 04:53 | |
*** yamahata_ has joined #openstack-infra | 04:54 | |
*** boris-42 has joined #openstack-infra | 04:55 | |
*** boris-42 has quit IRC | 04:55 | |
*** dcramer_ has quit IRC | 04:55 | |
*** boris-42 has joined #openstack-infra | 04:56 | |
*** DennyZhang has quit IRC | 04:57 | |
mordred | clarkb: we should not be using wheels in any way yet | 04:58 |
clarkb | mordred thanks I didn't think so | 04:59 |
mordred | clarkb: also, even if we were, it shouldnt' affect libvirt, since we don't pip install that | 04:59 |
mordred | :( | 04:59 |
clarkb | right but it may have affected $otherthing possibly | 05:00 |
portante | jog0: see http://paste.openstack.org/show/53637/ | 05:00 |
clarkb | mordred: mikal and jog0 are closing in on the problem I think | 05:01 |
portante | the conn_timeout is all about how long it takes a connect() system call to return | 05:01 |
jog0 | portante: looks good to me | 05:01 |
*** sarob has quit IRC | 05:01 | |
portante | 20 seconds might be too generous | 05:01 |
jog0 | gitreview that sucker | 05:01 |
jog0 | portante: your the swift expert your call | 05:01 |
*** sarob has joined #openstack-infra | 05:01 | |
jog0 | clarkb: closing in, is a strong term | 05:01 |
portante | node_timeout is all about how long between read operations a node takes to respond to the proxy server | 05:01 |
*** nati_ueno has joined #openstack-infra | 05:02 | |
portante | jog0: telling you this so that you can adjust after I hit the sack | 05:02 |
clarkb | jog0: I fully expect the problem to be gone tomorrow :) | 05:02 |
portante | I did not set this up to file via gerrit, so could you do that? | 05:02 |
portante | jog0 ? | 05:03 |
clarkb | portante I can do it in the morning if no one beats me to it | 05:03 |
jog0 | portante: sure | 05:03 |
portante | okay | 05:03 |
portante | what is the load like on the host machines all this runs on? | 05:05 |
*** yamahata_ has quit IRC | 05:05 | |
portante | clarkb, jog0? | 05:05 |
clarkb | it can be a little high but not for long sustained periods of time | 05:05 |
portante | enough to kill one request | 05:05 |
clarkb | possibly | 05:06 |
portante | So glance gets a 503 from swift and just gives up, which it should | 05:06 |
*** sarob has quit IRC | 05:06 | |
portante | but swift is actually completing the request behind the scenes | 05:06 |
portante | we should get this behavior in a bug for the swift team to comment on | 05:07 |
jog0 | I figured glance should give up but wasn't 100% sure | 05:07 |
jog0 | portante: leave a comment on https://bugs.launchpad.net/glance/+bug/1252514 | 05:07 |
uvirtbot | Launchpad bug 1252514 in swift "glance doesn't recover if Swift returns an error" [Undecided,New] | 05:07 |
portante | k | 05:07 |
*** xeyed4good has quit IRC | 05:08 | |
*** dcramer_ has joined #openstack-infra | 05:08 | |
*** sdake_ has joined #openstack-infra | 05:11 | |
jog0 | portante: https://review.openstack.org/57373 | 05:13 |
*** jcooley_ has joined #openstack-infra | 05:14 | |
*** arata has quit IRC | 05:14 | |
*** yamahata_ has joined #openstack-infra | 05:15 | |
portante | jog0: posted a +1 for that | 05:18 |
*** jcooley_ has quit IRC | 05:19 | |
jog0 | portante: woot! | 05:21 |
jog0 | horray teamwork | 05:22 |
portante | +1 | 05:22 |
portante | ship it | 05:22 |
portante | let's get one more gate job failure out of the way | 05:22 |
*** chandankumar has joined #openstack-infra | 05:22 | |
portante | or perhaps even "set of failures" out of the way | 05:22 |
jog0 | sdague, other devstack cores ^ | 05:27 |
*** sarob has joined #openstack-infra | 05:28 | |
*** jamesmcarthur has quit IRC | 05:29 | |
*** yamahata_ has quit IRC | 05:33 | |
portante | jog0: when will that make it into the gate jobs? | 05:36 |
jog0 | portante: when the devstack cores are around to +2 it and we can squeeze it through the gate | 05:38 |
jog0 | portante: raw numbers on gate issues http://paste.debian.net/66730/ | 05:38 |
portante | k, thanks, so the two big boys are not addressed by this, but it looks like this change will probably help out with two or three others | 05:40 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add launchpad support to check_success https://review.openstack.org/57374 | 05:40 |
jog0 | portante: yeah about to send out an email with that data but annotated | 05:41 |
portante | kewl, thanks | 05:42 |
* portante hits the sack | 05:42 | |
*** mgagne1 is now known as mgagne | 05:44 | |
sdague | jog0: just got back from dinner, looking | 05:45 |
*** jhesketh__ has quit IRC | 05:45 | |
sdague | jog0: bash8 is going to fail you for that :) | 05:46 |
sdague | mr tabs man | 05:46 |
*** ljjjustin has joined #openstack-infra | 05:46 | |
*** jhesketh__ has joined #openstack-infra | 05:47 | |
*** DinaBelova has quit IRC | 05:49 | |
*** SergeyLukjanov has joined #openstack-infra | 05:50 | |
jog0 | sdague: just fixed | 05:51 |
*** harlowja has quit IRC | 05:52 | |
sdague | you didn't read my other comment though | 05:52 |
*** marun has joined #openstack-infra | 05:55 | |
*** marun has quit IRC | 05:55 | |
*** vipul has quit IRC | 05:56 | |
*** mihgen has joined #openstack-infra | 05:56 | |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add launchpad support to check_success https://review.openstack.org/57374 | 05:56 |
*** vipul has joined #openstack-infra | 05:56 | |
*** sdake_ has quit IRC | 05:59 | |
*** sdake_ has joined #openstack-infra | 05:59 | |
*** sdake_ has quit IRC | 05:59 | |
*** sdake_ has joined #openstack-infra | 05:59 | |
*** nati_ueno has quit IRC | 06:03 | |
*** senk has joined #openstack-infra | 06:03 | |
*** nati_ueno has joined #openstack-infra | 06:04 | |
*** michchap has quit IRC | 06:04 | |
*** michchap has joined #openstack-infra | 06:05 | |
*** matsuhashi has quit IRC | 06:08 | |
*** matsuhashi has joined #openstack-infra | 06:08 | |
*** davidhadas has joined #openstack-infra | 06:08 | |
*** Ryan_Lane has quit IRC | 06:12 | |
*** matsuhashi has quit IRC | 06:13 | |
*** Ryan_Lane has joined #openstack-infra | 06:13 | |
*** Ryan_Lane has joined #openstack-infra | 06:13 | |
*** mestery has quit IRC | 06:14 | |
*** sarob has quit IRC | 06:15 | |
*** sarob has joined #openstack-infra | 06:15 | |
*** mestery has joined #openstack-infra | 06:18 | |
*** senk has quit IRC | 06:18 | |
*** ljjjustin has quit IRC | 06:21 | |
*** jcooley_ has joined #openstack-infra | 06:21 | |
*** markwash has quit IRC | 06:23 | |
*** yongli has joined #openstack-infra | 06:24 | |
*** jcooley_ has quit IRC | 06:24 | |
*** nosnos_ has joined #openstack-infra | 06:25 | |
openstackgerrit | Sergey Lukjanov proposed a change to openstack-infra/config: Setup devstack-gate tests for Savanna https://review.openstack.org/57317 | 06:25 |
*** senk has joined #openstack-infra | 06:25 | |
*** nosnos has quit IRC | 06:29 | |
*** matsuhashi has joined #openstack-infra | 06:29 | |
*** senk has quit IRC | 06:30 | |
*** vipul is now known as vipul-away | 06:34 | |
*** jcooley_ has joined #openstack-infra | 06:35 | |
*** jhesketh__ has quit IRC | 06:37 | |
*** marun has joined #openstack-infra | 06:38 | |
*** davidhadas has quit IRC | 06:42 | |
*** sdake_ has quit IRC | 06:44 | |
*** sarob has joined #openstack-infra | 06:47 | |
*** afazekas has quit IRC | 06:50 | |
*** dstanek has quit IRC | 06:51 | |
*** jcooley_ has quit IRC | 06:51 | |
*** SergeyLukjanov has quit IRC | 06:52 | |
*** luisg has quit IRC | 06:54 | |
*** cody-somerville has quit IRC | 06:55 | |
*** odyssey4me has joined #openstack-infra | 06:59 | |
*** sdake_ has joined #openstack-infra | 07:00 | |
*** sarob has quit IRC | 07:03 | |
*** jcooley_ has joined #openstack-infra | 07:03 | |
*** nosnos_ has quit IRC | 07:05 | |
*** sjing has quit IRC | 07:05 | |
*** arata has joined #openstack-infra | 07:05 | |
*** nosnos has joined #openstack-infra | 07:05 | |
*** odyssey4me has quit IRC | 07:05 | |
*** sjing has joined #openstack-infra | 07:06 | |
*** vipul-away is now known as vipul | 07:07 | |
*** mgagne1 has joined #openstack-infra | 07:08 | |
*** mgagne1 has quit IRC | 07:08 | |
*** mgagne1 has joined #openstack-infra | 07:08 | |
*** mgagne has quit IRC | 07:09 | |
*** jcooley_ has quit IRC | 07:10 | |
*** ljjjustin has joined #openstack-infra | 07:11 | |
*** marun has quit IRC | 07:13 | |
*** marun has joined #openstack-infra | 07:14 | |
*** denis_makogon has joined #openstack-infra | 07:21 | |
*** matsuhashi has quit IRC | 07:22 | |
*** matsuhashi has joined #openstack-infra | 07:23 | |
*** yolanda has joined #openstack-infra | 07:25 | |
*** afazekas_ has joined #openstack-infra | 07:29 | |
*** matsuhas_ has joined #openstack-infra | 07:30 | |
*** bingbu has quit IRC | 07:31 | |
*** bingbu has joined #openstack-infra | 07:32 | |
*** wenlock has quit IRC | 07:33 | |
*** matsuhashi has quit IRC | 07:34 | |
*** nsaje has joined #openstack-infra | 07:34 | |
*** che-arne has quit IRC | 07:35 | |
*** nsaje has quit IRC | 07:35 | |
*** DinaBelova has joined #openstack-infra | 07:35 | |
*** flaper87|afk is now known as flaper87 | 07:36 | |
*** mgagne1 has quit IRC | 07:45 | |
*** davidhadas has joined #openstack-infra | 07:51 | |
*** DinaBelova has quit IRC | 07:52 | |
*** sileht has joined #openstack-infra | 07:55 | |
*** sileht has quit IRC | 07:55 | |
*** sileht_ has joined #openstack-infra | 07:55 | |
*** sileht_ is now known as sileht | 07:56 | |
*** sdake_ has quit IRC | 07:56 | |
*** sarob has joined #openstack-infra | 07:59 | |
*** sarob has quit IRC | 08:03 | |
*** SergeyLukjanov has joined #openstack-infra | 08:04 | |
*** mihgen has quit IRC | 08:05 | |
*** marun has quit IRC | 08:05 | |
*** marun has joined #openstack-infra | 08:05 | |
*** dizquierdo has joined #openstack-infra | 08:06 | |
*** xeyed4good has joined #openstack-infra | 08:08 | |
*** jcooley_ has joined #openstack-infra | 08:11 | |
*** osanchez has joined #openstack-infra | 08:12 | |
*** xeyed4good has quit IRC | 08:12 | |
*** nsaje has joined #openstack-infra | 08:14 | |
*** Hefeweizen has quit IRC | 08:17 | |
*** matsuhas_ has quit IRC | 08:18 | |
*** matsuhashi has joined #openstack-infra | 08:19 | |
*** fbo_away is now known as fbo | 08:19 | |
*** DinaBelova has joined #openstack-infra | 08:20 | |
*** matsuhashi has quit IRC | 08:24 | |
*** boris-42 has quit IRC | 08:26 | |
*** boris-42 has joined #openstack-infra | 08:28 | |
openstackgerrit | A change was merged to openstack-dev/pbr: Ignore jenkins@openstack.org in authors building https://review.openstack.org/56407 | 08:29 |
*** matsuhashi has joined #openstack-infra | 08:31 | |
*** resker has joined #openstack-infra | 08:36 | |
*** arata has left #openstack-infra | 08:38 | |
openstackgerrit | David Caro proposed a change to openstack-infra/jenkins-job-builder: Added config options to not overwrite jobs desc https://review.openstack.org/52080 | 08:38 |
*** esker has quit IRC | 08:39 | |
*** mihgen has joined #openstack-infra | 08:41 | |
*** hashar has joined #openstack-infra | 08:43 | |
*** denis_makogon has quit IRC | 08:43 | |
*** jcoufal has joined #openstack-infra | 08:44 | |
*** jcooley_ has quit IRC | 08:45 | |
*** shardy_afk is now known as shardy | 08:46 | |
*** sarob has joined #openstack-infra | 08:47 | |
*** boris-42 has quit IRC | 08:49 | |
*** DinaBelova has quit IRC | 08:52 | |
*** ljjjustin has quit IRC | 08:52 | |
*** derekh has joined #openstack-infra | 08:54 | |
*** guohliu has quit IRC | 08:58 | |
ttx | lifeless: about spring/fall, feel free to propose alternate wording (it's in openstack/governance:reference/charter) -- the trick is since elections happen a number of weeks before release, I wanted to stay fuzzy ("spring" = March-May, "fall" = September-November) rather than write month names in stone | 08:58 |
*** nati_ueno has quit IRC | 08:59 | |
*** ilyashakhat has joined #openstack-infra | 08:59 | |
*** DinaBelova has joined #openstack-infra | 08:59 | |
lifeless | ttx: there you go. | 09:00 |
ttx | if it fell on clear quarters we could have used Q2/Q4 but that's not really the case | 09:01 |
*** yassine has joined #openstack-infra | 09:06 | |
*** jpich has joined #openstack-infra | 09:06 | |
*** arata has joined #openstack-infra | 09:10 | |
openstackgerrit | Marcus Nilsson proposed a change to openstack-infra/jenkins-job-builder: Added support for Stash Notifier https://review.openstack.org/56337 | 09:11 |
*** marun has quit IRC | 09:13 | |
lifeless | ttx: sure, my main point was that fall and spring are relative terms | 09:14 |
*** zaro0508 has quit IRC | 09:14 | |
*** zaro0508 has joined #openstack-infra | 09:14 | |
lifeless | ttx: and it's hemispherist to assume they are northern without calling it out | 09:14 |
*** marun has joined #openstack-infra | 09:14 | |
ttx | I definitely am an hemispherist. I should get invited south more often | 09:16 |
lifeless | ttx: open invite here. | 09:16 |
lifeless | ttx: just bring a crate of nice French wine. | 09:16 |
*** sjing has quit IRC | 09:17 | |
*** arata has quit IRC | 09:19 | |
*** sarob has quit IRC | 09:21 | |
*** alexpilotti has quit IRC | 09:23 | |
*** bingbu has quit IRC | 09:25 | |
*** talluri has joined #openstack-infra | 09:28 | |
yolanda | mordred, jeblair, any update to the licensecheck jenkins bug? | 09:29 |
jcoufal | hey, can we add #openstack-ux channel to the list of OpenStack IRCs? (https://wiki.openstack.org/wiki/IRC) | 09:29 |
*** afazekas_ is now known as afazekas | 09:34 | |
*** Ryan_Lane has quit IRC | 09:34 | |
*** D30 has quit IRC | 09:35 | |
DinaBelova | hello, guys! We have some strange thing with Jenkins tests yesterday and today this problem seems to be unsolved. On the same change pep8 may be failed (and errors are like 'module X is not a molude. Import only modules') and may be passed ok. Like for this change https://review.openstack.org/#/c/57106/ (it was merged finally): failed logs - http://logs.openstack.org/06/57106/2/check/gate-climate-pep8/ac52a11/console.html good ones - | 09:35 |
DinaBelova | http://logs.openstack.org/06/57106/2/gate/gate-climate-pep8/44394c9/console.html | 09:35 |
DinaBelova | Do you have any idea what it may be connected with? | 09:36 |
*** pblaho has joined #openstack-infra | 09:38 | |
*** dizquierdo has quit IRC | 09:39 | |
*** D30 has joined #openstack-infra | 09:40 | |
ogelbukh | DinaBelova: gate seems to be unstable for last two days at least | 09:41 |
*** D30 has quit IRC | 09:41 | |
*** jcooley_ has joined #openstack-infra | 09:41 | |
DinaBelova | ogelbukh, I've got it... But really I missed if there were any comments or other guys' complaints... So that's the known issue? | 09:43 |
jpich | jcoufal: You should be able to edit the page yourself to add it, I don't think there are other requirements | 09:44 |
jcoufal | jpich: okey, great, I just didn't know if there is need for some approval from openstack-infra team | 09:45 |
ogelbukh | DinaBelova: here's some info on this http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html | 09:46 |
DinaBelova | ogelbukh, thank you so much | 09:46 |
*** sarob has joined #openstack-infra | 09:47 | |
*** jcooley_ has quit IRC | 09:47 | |
*** davidhadas has quit IRC | 09:48 | |
jpich | jcoufal: They will probably request to register it under infra, which should be fine. We can ask about that in the afternoon when the infra chaps wake up :) | 09:48 |
jcoufal | jpich: sure | 09:48 |
*** resker has quit IRC | 09:49 | |
*** esker has joined #openstack-infra | 09:50 | |
*** sarob has quit IRC | 09:51 | |
*** odyssey4me has joined #openstack-infra | 09:52 | |
*** esker has quit IRC | 09:54 | |
*** talluri has quit IRC | 09:55 | |
*** mattymo has joined #openstack-infra | 09:56 | |
*** talluri has joined #openstack-infra | 09:56 | |
*** masayukig has quit IRC | 09:58 | |
*** Ryan_Lane has joined #openstack-infra | 10:04 | |
*** davidhadas has joined #openstack-infra | 10:08 | |
*** Ryan_Lane has quit IRC | 10:13 | |
*** matsuhashi has quit IRC | 10:14 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 10:16 | |
*** _SergeyLukjanov has quit IRC | 10:17 | |
*** SergeyLukjanov has joined #openstack-infra | 10:21 | |
*** ruhe has joined #openstack-infra | 10:21 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 10:24 | |
*** SergeyLukjanov has joined #openstack-infra | 10:24 | |
*** nsaje has quit IRC | 10:26 | |
*** plomakin has quit IRC | 10:26 | |
*** nsaje has joined #openstack-infra | 10:27 | |
lifeless | ttx: do you happen to have a script to assess LP bug activity atm ? | 10:29 |
lifeless | ttx: see https://etherpad.openstack.org/p/nova-bug-triage for context | 10:29 |
*** talluri has quit IRC | 10:30 | |
*** SergeyLukjanov has quit IRC | 10:30 | |
ttx | lifeless: the only time-based data I have would be http://status.openstack.org/bugday/ | 10:30 |
ttx | (recent bug activity) | 10:30 |
ttx | launchpad is desperately dry when it comes to historical data, as you probably know | 10:31 |
ttx | I may have other webnumbr's around though | 10:31 |
*** nsaje has quit IRC | 10:31 | |
* ttx digs deeper | 10:32 | |
ttx | http://webnumbr.com/untouched-nova-bugs | 10:32 |
ttx | http://webnumbr.com/open-nova-bugs | 10:33 |
ttx | http://webnumbr.com/nova-bugfixes | 10:34 |
ttx | lifeless: that all I have for nova ^ | 10:34 |
lifeless | ttx: I think I'll refactor reviewstats to be unrecognisable | 10:38 |
lifeless | and then feed in bug data as a source | 10:38 |
lifeless | cause we all need a new data analytics framework | 10:38 |
ttx | I'm interested in what you come up with. | 10:39 |
ttx | those workarounds above all query LP at regular intervals and try to build some historical data, but the queries are quite narrow | 10:40 |
*** lcestari has joined #openstack-infra | 10:41 | |
*** DinaBelova has quit IRC | 10:41 | |
*** marun has quit IRC | 10:42 | |
*** jcooley_ has joined #openstack-infra | 10:43 | |
hashar | <rant>attempted to switch my Zuul setup to use the Gearman version, turns out the Jenkins gearman plugin has a very nasty bug :/ </rant> | 10:44 |
* hashar blames Zaro and jeblair :D | 10:44 | |
lifeless | ttx: meh, I'll just query the s**t out of LP. | 10:46 |
lifeless | ttx: iteration 0, work but not be pretty. | 10:46 |
ttx | lifeless: heh, I wonder how much % of LP total traffic can be traced back to people regularly querying it to work around its lack of historical data and graphs. | 10:46 |
*** osanchez is now known as OlivierSanchez | 10:46 | |
*** sarob has joined #openstack-infra | 10:47 | |
lifeless | ttx: a fairly large amount, but that traffic is also cheap to answer. | 10:47 |
lifeless | ttx: we had trouble when we had several thousand such scripts all running at once in the OEM team | 10:47 |
mattymo | is anyone aware of an IRC bot that watches changes in Launchpad bugs, similar to our lovely gerritbot? | 10:49 |
*** ruhe has joined #openstack-infra | 10:50 | |
*** sarob has quit IRC | 10:51 | |
*** OlivierSanchez is now known as osanchez | 10:52 | |
*** osanchez has quit IRC | 10:54 | |
*** osanchez has joined #openstack-infra | 10:54 | |
lifeless | mattymo: I'm sure there are several | 10:57 |
lifeless | mattymo: launchpad-users list would be a place to ask | 10:58 |
*** ruhe has quit IRC | 10:58 | |
*** SergeyLukjanov has joined #openstack-infra | 10:59 | |
*** dpyzhov has joined #openstack-infra | 10:59 | |
ogelbukh | lifeless: I guess mattymo means if there anything like that in infra | 11:00 |
dpyzhov | hi. what happened with review.openstack.org? Reivews with +2 are hang unmerged | 11:01 |
dpyzhov | Any estimates for fix? | 11:01 |
*** DinaBelova has joined #openstack-infra | 11:04 | |
*** odyssey4me has quit IRC | 11:05 | |
*** ruhe has joined #openstack-infra | 11:09 | |
lifeless | dpyzhov: see joe's email to -dev | 11:09 |
lifeless | dpyzhov: bad tests/code in trunk -> flaky gate -> backlog | 11:09 |
*** Ryan_Lane has joined #openstack-infra | 11:10 | |
mattymo | lifeless, this one? http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html | 11:11 |
*** sgran has joined #openstack-infra | 11:12 | |
sgran | hello. I'm curious who I talk to about a tempest change | 11:12 |
*** sandywalsh has joined #openstack-infra | 11:12 | |
ogelbukh | mattymo: that one, yes | 11:12 |
sgran | https://review.openstack.org/#/c/57311/ is the one I'm looking at | 11:12 |
mattymo | it doesn't indicate that merging was going to be frozen for any project (including stackforge) | 11:13 |
*** yaguang has quit IRC | 11:13 | |
*** odyssey4me has joined #openstack-infra | 11:14 | |
lifeless | mattymo: it's not frozen; its just very very slow because all the optimisations depend on a low gate failure rate. | 11:14 |
*** Ryan_Lane has quit IRC | 11:15 | |
mattymo | nice | 11:16 |
*** mihgen has quit IRC | 11:16 | |
*** jcooley_ has quit IRC | 11:17 | |
*** pcm_ has joined #openstack-infra | 11:19 | |
*** pcm_ has quit IRC | 11:23 | |
*** pcm_ has joined #openstack-infra | 11:24 | |
*** mihgen has joined #openstack-infra | 11:25 | |
*** davidhadas has quit IRC | 11:35 | |
*** mihgen_ has joined #openstack-infra | 11:41 | |
*** mihgen has quit IRC | 11:45 | |
*** mihgen_ is now known as mihgen | 11:45 | |
*** sarob has joined #openstack-infra | 11:47 | |
*** hashar_ has joined #openstack-infra | 11:48 | |
*** hashar has quit IRC | 11:49 | |
*** hashar_ is now known as hashar | 11:49 | |
*** dstanek has joined #openstack-infra | 11:51 | |
*** dstanek has quit IRC | 11:55 | |
ekarlso | what is the new local.conf thing in devstack ? | 11:59 |
*** yamahata_ has joined #openstack-infra | 12:00 | |
*** davidhadas has joined #openstack-infra | 12:00 | |
*** resker has joined #openstack-infra | 12:04 | |
*** resker has quit IRC | 12:05 | |
BobBall | ekarlso: it's an amalgamation of locarc and other changes you might want to make to nova.conf post-devstack installation | 12:08 |
BobBall | you can use the existing localrc if you want, or migrate it to the brave new world | 12:08 |
*** nosnos_ has joined #openstack-infra | 12:09 | |
BobBall | ekarlso: http://devstack.org/localrc.html | 12:09 |
ekarlso | oh | 12:10 |
ekarlso | cool! | 12:10 |
*** odyssey4me has quit IRC | 12:11 | |
*** Ryan_Lane has joined #openstack-infra | 12:11 | |
*** nosnos has quit IRC | 12:12 | |
*** jcooley_ has joined #openstack-infra | 12:13 | |
*** nosnos_ has quit IRC | 12:14 | |
*** Ryan_Lane has quit IRC | 12:16 | |
*** nsaje has joined #openstack-infra | 12:16 | |
*** odyssey4me has joined #openstack-infra | 12:19 | |
*** jcooley_ has quit IRC | 12:19 | |
*** sarob has quit IRC | 12:20 | |
*** ruhe has quit IRC | 12:21 | |
*** jamesmcarthur has joined #openstack-infra | 12:23 | |
*** nsaje has quit IRC | 12:23 | |
*** nsaje has joined #openstack-infra | 12:23 | |
*** marun has joined #openstack-infra | 12:24 | |
*** michchap_ has joined #openstack-infra | 12:25 | |
*** michchap has quit IRC | 12:25 | |
*** ruhe has joined #openstack-infra | 12:26 | |
*** chuck__ is now known as zul | 12:26 | |
zul | gates are backed up i guess? | 12:27 |
*** ruhe has quit IRC | 12:27 | |
*** nsaje has quit IRC | 12:28 | |
*** boris-42 has joined #openstack-infra | 12:30 | |
*** ruhe has joined #openstack-infra | 12:31 | |
*** sarob has joined #openstack-infra | 12:32 | |
*** marun has quit IRC | 12:34 | |
*** marun has joined #openstack-infra | 12:34 | |
*** sarob has quit IRC | 12:37 | |
*** johnthetubaguy has joined #openstack-infra | 12:45 | |
*** davidhadas has quit IRC | 12:47 | |
*** sarob has joined #openstack-infra | 12:47 | |
*** pcm__ has joined #openstack-infra | 12:49 | |
*** sarob has quit IRC | 12:51 | |
*** pcm_ has quit IRC | 12:52 | |
*** AJaeger has joined #openstack-infra | 12:53 | |
*** pcm__ has quit IRC | 12:56 | |
*** dprince has joined #openstack-infra | 12:57 | |
*** changbl has quit IRC | 12:57 | |
*** pcm_ has joined #openstack-infra | 12:57 | |
*** hashar has quit IRC | 12:58 | |
*** jamesmcarthur has quit IRC | 13:01 | |
BobBall | I think it's a bit worse than backed up - but I'm not sure | 13:02 |
BobBall | gate should be given priority over check - but only a few gate jobs are running, but loads of checks are | 13:02 |
BobBall | https://review.openstack.org/#/c/56065/ was approved yesterday but gate jobs still haven't started... :) | 13:03 |
BobBall | Sounds like a broken gate to me! | 13:03 |
*** nsaje has joined #openstack-infra | 13:05 | |
*** fifieldt has quit IRC | 13:07 | |
*** ruhe has quit IRC | 13:08 | |
*** hashar has joined #openstack-infra | 13:08 | |
anteaya | hi BobBall yes gate is in a bad way | 13:09 |
BobBall | Poor thing... | 13:09 |
anteaya | we had a discussion about it last night, different options for what to do | 13:09 |
BobBall | Probably having a huff | 13:09 |
anteaya | I went to sleep before it all came to an end | 13:09 |
anteaya | was about to read the backlog | 13:09 |
* BobBall reads the backlog too :) | 13:09 | |
anteaya | I think rather a perfect storm of many things | 13:10 |
anteaya | trying to get them identified and deal with them effectively | 13:10 |
anteaya | :D | 13:10 |
BobBall | looks nasty :) | 13:11 |
anteaya | yeah, 120 in the gate and only the top 7 patches have running jobs atm | 13:12 |
*** thomasem has joined #openstack-infra | 13:12 | |
anteaya | it is | 13:12 |
*** Ryan_Lane has joined #openstack-infra | 13:12 | |
anteaya | with no clear and easy approach to fix | 13:12 |
BobBall | and unfortunately fungi's last comment was that he hoped things improved with lower activity | 13:12 |
BobBall | which they have not | 13:12 |
BobBall | it's just got worse! | 13:12 |
anteaya | yes | 13:12 |
anteaya | if you have time to add your thoughts, bug tracking expertise it would not go amiss today | 13:12 |
anteaya | methinks the gate will be at the fore of activity again today | 13:13 |
BobBall | so the problem is too many failures caused by bugs? | 13:13 |
*** dstanek has joined #openstack-infra | 13:13 | |
anteaya | that is a big part of it | 13:13 |
BobBall | in the rechecks queue? | 13:13 |
anteaya | the fact we lost jenkins01 very early Monday morning and haven't recovered since doesn't help either | 13:13 |
BobBall | well jenkins01 hasn't had a holiday for ages | 13:14 |
anteaya | jenkins01 is back online but the hiccup really set us back in terms of keeping up to the volume | 13:14 |
anteaya | yeah, and had a bit of a hard time Monday morning - too many processes running so the cron job failed and it reset its puppet.conf to a default | 13:15 |
anteaya | which set cert to undefined | 13:15 |
anteaya | it created and deleted nodes but ran no jobs until we got it back on track | 13:15 |
anteaya | then clarkb and fungi had to go in an manually delted unattached nodes | 13:15 |
anteaya | that was Monday | 13:15 |
*** jcooley_ has joined #openstack-infra | 13:15 | |
BobBall | blimey - as you say - a perfect storm | 13:15 |
anteaya | yes | 13:16 |
*** Ryan_Lane has quit IRC | 13:16 | |
*** alcabrera has joined #openstack-infra | 13:16 | |
anteaya | so any help or kind words you have would be most welcome | 13:16 |
*** boris-42 has quit IRC | 13:17 | |
*** afazekas is now known as afazekas_mtg | 13:18 | |
BobBall | I'm afraid that I could offer substantially less help than those who have already been involved... but I can have a look at a few of the recheck bugs to see if I can work out what's going on | 13:18 |
*** boris-42 has joined #openstack-infra | 13:18 | |
*** nsaje has quit IRC | 13:18 | |
anteaya | awesome thanks BobBall | 13:19 |
anteaya | that would be a great help | 13:19 |
BobBall | don't be so sure! | 13:19 |
*** nsaje has joined #openstack-infra | 13:19 | |
anteaya | just glad you are willing to look | 13:19 |
BobBall | One better way of doing this - is there a way to get a list of all changes that have been verified but failed gate? | 13:23 |
*** herndon_ has joined #openstack-infra | 13:24 | |
*** nsaje has quit IRC | 13:24 | |
anteaya | BobBall: http://status.openstack.org/rechecks/ | 13:27 |
anteaya | BobBall: http://status.openstack.org/elastic-recheck/ | 13:27 |
*** zoresvit has quit IRC | 13:27 | |
anteaya | this is a converstation going that we may disable elastic-recheck as it might be being abused and causing folks to push through bad patches | 13:28 |
anteaya | or contributing to that behaviour if not causing | 13:28 |
*** zoresvit has joined #openstack-infra | 13:30 | |
BobBall | elastic-recheck is close to what I was thinking... the rechecks page doesn't include all of the gate failures - does it even get update with reverify vs recheck? | 13:32 |
*** amotoki has joined #openstack-infra | 13:32 | |
*** ruhe has joined #openstack-infra | 13:34 | |
BobBall | heh... I see enough people have been on the two bugs that are causing problems | 13:34 |
*** ruhe has quit IRC | 13:34 | |
BobBall | the neutron one and the console logs one | 13:35 |
*** ruhe has joined #openstack-infra | 13:35 | |
BobBall | I suspect I can't help with either, but I'll look at the console logs one just in case! | 13:35 |
*** ilyashakhat has quit IRC | 13:35 | |
anteaya | okay look back in the logs last night for a conversation between mikal and clarkb | 13:35 |
anteaya | they were working on it and had saved a few nodes to see if that would help | 13:36 |
*** ilyashakhat has joined #openstack-infra | 13:36 | |
anteaya | BobBall: and noone is on this bug yet: https://bugs.launchpad.net/neutron/+bug/1251784 | 13:37 |
uvirtbot | Launchpad bug 1251784 in nova "nova+neutron scheduling error: Connection to neutron failed: Maximum attempts reached" [Critical,New] | 13:37 |
anteaya | I was just about to ask about it in -neutron | 13:37 |
*** dkranz has joined #openstack-infra | 13:37 | |
anteaya | do you have time to help on that one? | 13:37 |
BobBall | I have absolutely no knowledge in the area :/ | 13:37 |
*** DinaBelova has quit IRC | 13:37 | |
anteaya | understood | 13:38 |
anteaya | thanks though | 13:38 |
BobBall | I wish I did - I hit that in gate a few months ago - but it was claimed to have been fixed | 13:38 |
BobBall | That was https://bugs.launchpad.net/bugs/1211915 | 13:39 |
uvirtbot | Launchpad bug 1211915 in neutron/havana "Connection to neutron failed: Maximum attempts reached" [High,Fix committed] | 13:39 |
*** osanchez is now known as cdk_gerritbot | 13:40 | |
*** cdk_gerritbot is now known as osanchez | 13:40 | |
*** zul has quit IRC | 13:41 | |
*** yaguang has joined #openstack-infra | 13:41 | |
*** zul has joined #openstack-infra | 13:41 | |
anteaya | BobBall: according to jog0 this bug is a different root cause | 13:44 |
anteaya | I personally don't know enough about the internals to verify independently | 13:44 |
anteaya | I am, for better or worse, trusting jog0 on this assessment | 13:44 |
BobBall | ah ok | 13:45 |
BobBall | indeed :) | 13:45 |
*** afazekas_mtg is now known as afazekas | 13:45 | |
*** dcramer_ has quit IRC | 13:46 | |
*** cyril has joined #openstack-infra | 13:46 | |
*** osanchez has quit IRC | 13:46 | |
*** DinaBelova has joined #openstack-infra | 13:46 | |
*** osanchez has joined #openstack-infra | 13:46 | |
*** cyril is now known as Guest97079 | 13:46 | |
*** mihgen_ has joined #openstack-infra | 13:47 | |
*** ogelbukh has quit IRC | 13:47 | |
*** sarob has joined #openstack-infra | 13:47 | |
*** mihgen has quit IRC | 13:47 | |
*** mihgen_ is now known as mihgen | 13:47 | |
*** ilyashakhat_ has joined #openstack-infra | 13:48 | |
*** ilyashakhat has quit IRC | 13:48 | |
*** jcooley_ has quit IRC | 13:49 | |
anteaya | :D | 13:50 |
*** ogelbukh has joined #openstack-infra | 13:50 | |
*** cody-somerville has joined #openstack-infra | 13:50 | |
openstackgerrit | Marcus Nilsson proposed a change to openstack-infra/jenkins-job-builder: Added support for Stash Notifier https://review.openstack.org/56337 | 13:51 |
openstackgerrit | Thierry Carrez proposed a change to openstack-infra/config: Track icehouse development in releasestatus https://review.openstack.org/57441 | 13:51 |
*** nsaje has joined #openstack-infra | 13:51 | |
*** nsaje has quit IRC | 13:51 | |
*** nsaje has joined #openstack-infra | 13:52 | |
*** nsaje has quit IRC | 13:52 | |
*** nsaje has joined #openstack-infra | 13:52 | |
*** sarob has quit IRC | 13:52 | |
*** sarob has joined #openstack-infra | 13:53 | |
anteaya | so fungi I know that the gate will be your immediate concern upon arrival | 13:53 |
anteaya | something to know is that marun is working on one of the -neutron gate block bugs | 13:54 |
anteaya | and needs to talk to you, about another job is it marun? | 13:54 |
marun | anteaya: the job addition is a side-project to introduce functional testing, it won't fix gate issues | 13:54 |
anteaya | ah sorry, I mis-understood | 13:55 |
anteaya | so for later when introducing functional testing | 13:55 |
marun | we already have functional tests in the tree, but some of them can't run because they need sudo privileges | 13:57 |
marun | someone suggested at the summit to create a new functional-only job that runs as the tempest user so that sudo is allowed | 13:58 |
sgran | can I ask for a review of https://review.openstack.org/#/c/57311/ when someone has a moment, please? | 13:59 |
sgran | it will allow me to make a real change in neutron afterwards | 14:00 |
*** dkliban has joined #openstack-infra | 14:01 | |
*** julim has joined #openstack-infra | 14:04 | |
*** yamahata_ has quit IRC | 14:06 | |
*** yamahata_ has joined #openstack-infra | 14:06 | |
*** dolphm has joined #openstack-infra | 14:06 | |
*** sarob has quit IRC | 14:08 | |
*** ruhe has quit IRC | 14:10 | |
*** ruhe has joined #openstack-infra | 14:12 | |
*** hashar has quit IRC | 14:14 | |
*** ruhe has quit IRC | 14:16 | |
*** jergerber has joined #openstack-infra | 14:18 | |
*** CaptTofu has quit IRC | 14:19 | |
*** CaptTofu has joined #openstack-infra | 14:19 | |
*** thomasm has joined #openstack-infra | 14:22 | |
*** thomasm is now known as Guest98408 | 14:23 | |
*** mriedem has joined #openstack-infra | 14:23 | |
fungi | anteaya: i'm no longer concerned, just holding out hope that someone will fix the current bugs in openstack which are slowing down gating | 14:23 |
*** thomasem has quit IRC | 14:23 | |
*** Guest98408 is now known as thomasem | 14:23 | |
fungi | sgran: i'm happy to review your tempest change, but keep in mind that the infrastructure team aren't core reviewers on tempest... you probably want the qa team (headquartered in #openstack-qa) | 14:24 |
sgran | ah, great | 14:24 |
anteaya | fungi: okay | 14:25 |
sgran | and, the more reviewers the better, so please :) | 14:25 |
*** jamesmcarthur has joined #openstack-infra | 14:25 | |
fungi | anteaya: it looks like zuul, jenkins, nodepool et al are working as intended, so i'm actually thrilled they don't fall over under a worst-case testing situation such as this | 14:27 |
anteaya | yes | 14:28 |
*** AJaeger has left #openstack-infra | 14:28 | |
anteaya | okay wasn't sure on what you wanted to address first this morning | 14:28 |
fungi | it looks like the gate actually caught up quite a bit but then a lot of new changes started getting approved 4-5 hours ago, bringing it up to its current length | 14:28 |
anteaya | ah | 14:28 |
fungi | (note the 8-hour sparklines above the different pipelines) | 14:29 |
anteaya | just came out of one of the neutron meetings, can't keep track of them all yet, mestery needs to discuss the creation of a testing structure for multi-node testing | 14:29 |
anteaya | fungi: the sparklines are coming back up aren't they | 14:30 |
*** dprince has quit IRC | 14:30 | |
*** weshay has joined #openstack-infra | 14:31 | |
fungi | the change at the head of the integrated gate queue was approved a little over 14 hours ago, so things actually *are* moving | 14:31 |
fungi | it only seems like wading through pitch compared to how fast things get through on sunnier days | 14:32 |
anteaya | yes | 14:34 |
*** dprince has joined #openstack-infra | 14:34 | |
anteaya | morning dprince | 14:34 |
dprince | anteaya: hello | 14:34 |
anteaya | :D | 14:35 |
*** markmc has joined #openstack-infra | 14:37 | |
*** mfer has joined #openstack-infra | 14:40 | |
*** boris-42_ has joined #openstack-infra | 14:40 | |
*** boris-42 has quit IRC | 14:41 | |
ttx | mordred: in case you missed it, you were #action-ed to create a thread summarizing the issue with glance client lib branches on the ML, to kick off the discussion there | 14:42 |
ttx | fun, uh? | 14:43 |
*** thomasem has quit IRC | 14:44 | |
*** thomasem has joined #openstack-infra | 14:44 | |
*** sarob has joined #openstack-infra | 14:47 | |
*** senk has joined #openstack-infra | 14:50 | |
BobBall | Where is the devstack-gate.yaml housed now? I think we need the same change as https://review.openstack.org/#/c/53249/2/modules/openstack_project/files/jenkins_job_builder/config/devstack-gate.yaml in grenade (up timeout to 90 minutes) - hit in check queue at http://logs.openstack.org/31/57431/2/check/check-grenade-devstack-vm/d8cfbd8/console.html | 14:51 |
*** ekarlso has quit IRC | 14:51 | |
*** ekarlso has joined #openstack-infra | 14:52 | |
*** ekarlso has quit IRC | 14:52 | |
BobBall | oh... heh... typical. I just found it in config | 14:52 |
*** wenlock has joined #openstack-infra | 14:53 | |
openstackgerrit | Marcus Nilsson proposed a change to openstack-infra/jenkins-job-builder: Added support for Stash Notifier https://review.openstack.org/56337 | 14:53 |
*** ekarlso has joined #openstack-infra | 14:53 | |
*** johnthetubaguy1 has joined #openstack-infra | 14:54 | |
*** johnthetubaguy has quit IRC | 14:54 | |
*** changbl has joined #openstack-infra | 14:56 | |
*** dolphm has quit IRC | 14:56 | |
anteaya | BobBall: config, the black hole of all that is | 14:57 |
*** dolphm has joined #openstack-infra | 14:57 | |
openstackgerrit | Bob Ball proposed a change to openstack-infra/config: Increase timeout for grenade to 90 minutes https://review.openstack.org/57450 | 14:58 |
anteaya | BobBall: oh sorry, I didn't see that before | 14:59 |
*** luisg has joined #openstack-infra | 14:59 | |
anteaya | we are currently frowning on increasing timeouts as a way to solve testing problems | 14:59 |
BobBall | shame... :) | 14:59 |
anteaya | that just contributes to the mass expansion of all the tests | 14:59 |
anteaya | doesn't mean you don't have a case for it | 14:59 |
*** sandywalsh has quit IRC | 15:00 | |
anteaya | but you might be encouraged to pursue other options first | 15:00 |
*** dcramer_ has joined #openstack-infra | 15:00 | |
anteaya | ttx let's hope 56150 makes it through the gate | 15:01 |
*** dizquierdo has joined #openstack-infra | 15:01 | |
BobBall | I wouldn't know where to start - it's a grenade failure where things just got killed after 60 minutes - and it's running code unrelated to my change (since my change was in xenapi and grenade is running KVM!) :) | 15:01 |
anteaya | BobBall: have you tried engaging anyone in -qa into conversation about it? | 15:02 |
BobBall | heh | 15:03 |
BobBall | no | 15:03 |
BobBall | but it's ok | 15:03 |
BobBall | this is just another symptom of the gate being broken | 15:03 |
*** sandywalsh has joined #openstack-infra | 15:03 | |
* BobBall delets his change | 15:03 | |
*** nati_ueno has joined #openstack-infra | 15:03 | |
fungi | yeah, on 57450 i'd want to see some consensual +1s from the qa core devs that it's really the best way out | 15:03 |
BobBall | 2013-11-20 12:51:54.621 | inet 10.7.205.65/15 brd 10.7.255.255 scope global eth0 | 15:03 |
BobBall | 2013-11-20 13:27:54.376 | Triggered by: https://review.openstack.org/57431 patchset 2 | 15:03 |
BobBall | more than 30 minutes just waiting | 15:03 |
BobBall | doing nothing :) | 15:04 |
fungi | in cases like that, often increasing the timeout does nothing other than let the job sit there stuck for longer before getting killed, which is a step in the wrong direction of course | 15:04 |
*** dkliban_ has joined #openstack-infra | 15:04 | |
BobBall | yup - agreed | 15:05 |
*** dkliban has quit IRC | 15:05 | |
BobBall | although in my case it would have just made it through because it was finally running tests quickly | 15:05 |
*** wenlock has quit IRC | 15:05 | |
BobBall | dunnoy what the pause was caused by | 15:05 |
*** sarob has quit IRC | 15:05 | |
*** dcramer_ has quit IRC | 15:06 | |
*** alcabrera is now known as alcabrera|afk | 15:09 | |
*** odyssey4me has quit IRC | 15:09 | |
*** herndon_ has quit IRC | 15:12 | |
openstackgerrit | Roman Prykhodchenko proposed a change to openstack-infra/devstack-gate: Support Ironic in devstack gate https://review.openstack.org/53899 | 15:12 |
*** senk has quit IRC | 15:12 | |
*** senk has joined #openstack-infra | 15:16 | |
*** blamar has quit IRC | 15:16 | |
*** ruhe has joined #openstack-infra | 15:17 | |
*** odyssey4me has joined #openstack-infra | 15:18 | |
Alex_Gaynor | ttx: A phrase you might like: DX, "Developer experience", UX for developers. Meaning stuff like docs, good sdks, cli clients, etc. | 15:18 |
*** dcramer_ has joined #openstack-infra | 15:18 | |
*** davidhadas has joined #openstack-infra | 15:22 | |
*** thedodd has joined #openstack-infra | 15:23 | |
annegentle | ttx: I like DX too | 15:23 |
*** hashar has joined #openstack-infra | 15:23 | |
fungi | interfaces for sentient life forms | 15:23 |
*** blamar has joined #openstack-infra | 15:23 | |
*** dcramer__ has joined #openstack-infra | 15:23 | |
*** xeyed4good has joined #openstack-infra | 15:24 | |
*** ruhe has quit IRC | 15:25 | |
annegentle | fungi: heh | 15:25 |
annegentle | IFSLF | 15:25 |
*** dcramer_ has quit IRC | 15:25 | |
fungi | annegentle: oh, heads up, yesterday i tagged the tip of the stable/folsom branch of openstack-manuals with "folsom-eol" and then removed the branch (same thing was done for essex and diablo during previous cycles). if you think it's causing/caused any issues, let me know so we can work through them | 15:27 |
annegentle | fungi: should be fine (off the top of my head) | 15:27 |
fungi | k, great | 15:27 |
annegentle | fungi: we already redirect away from folsom | 15:27 |
*** ben_duyujie has joined #openstack-infra | 15:27 | |
*** davidhadas has quit IRC | 15:28 | |
fungi | well, it doesn't remove any documents which were already published... just prevents you from being able to land new changes to that branch any longer | 15:28 |
*** davidhadas has joined #openstack-infra | 15:29 | |
*** ben_duyujie has quit IRC | 15:30 | |
SergeyLukjanov | hi folks, I have a question about testr... we're using some resource files from tests, what's the right function to load them? | 15:30 |
SergeyLukjanov | we're using the code like open(pkg.resource_filename(version.version_info.package, file_name)).read() to read such files now | 15:30 |
sdague | BobBall: so in that failure it took 30 minutes to prep the node in nodepool | 15:31 |
*** ben_duyujie has joined #openstack-infra | 15:31 | |
BobBall | Indeed. I was too keen to follow a previous bug that upped the timeout for devstack-full :) | 15:32 |
annegentle | dumb question. is lifeless on the tc? | 15:33 |
notmyname | fungi: starting (my) day with 137 gate jobs. do I need to go buy a plunger or a snake? | 15:33 |
fungi | notmyname: stuff is actually moving through. i think it got down to around 50ish changes for a while, but around 0900 utc, but then a bunch of changes started getting approved which has been steadily increasing since | 15:34 |
*** ben_duyujie1 has joined #openstack-infra | 15:34 | |
*** pblaho has quit IRC | 15:35 | |
fungi | er, by around 0900 utc | 15:35 |
ttx | Alex_Gaynor: yes, in our case U ~= D | 15:35 |
fungi | spot checking when i first got up, changes at the head of the queue were approved around 14 hours previous | 15:35 |
ttx | (imho) | 15:36 |
notmyname | fungi: ok. let me know if we need to stop (or start) doing stuff on the swift side. we've got 12 patches approved but not landed because of gate issues | 15:36 |
openstackgerrit | Jaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add batch_tasks support. https://review.openstack.org/57469 | 15:36 |
anteaya | annegentle: yes he is | 15:36 |
anteaya | not a dumb question | 15:36 |
*** dcramer__ has quit IRC | 15:37 | |
*** mihgen has quit IRC | 15:37 | |
fungi | notmyname: there was one swift-related failure cropping up on some jobs, but portante mentioned last night that he'd look into the details | 15:37 |
*** wenlock has joined #openstack-infra | 15:37 | |
*** ben_duyujie has quit IRC | 15:38 | |
*** nsaje has quit IRC | 15:38 | |
notmyname | fungi: that was the timeout issue? ie it's taking something like 35+ seconds to talk to a storage node. the default timeout is 10 seconds, but the root cause is that there are drive contention issues. ie the disk is backing up and not letting stuff get flushed | 15:38 |
annegentle | anteaya: ah robert collins | 15:38 |
annegentle | anteaya: thanks | 15:38 |
anteaya | np | 15:38 |
*** nsaje has joined #openstack-infra | 15:39 | |
notmyname | fungi: I'm torn on the "solution" of raising the timeout. it may improve stuff, but it doesn't seem to be addressing the root issue | 15:39 |
*** datsun180b has joined #openstack-infra | 15:39 | |
fungi | notmyname: ahh, could have been. i think jog0 brought it up, but not sure of the specifics | 15:39 |
*** davidhadas has quit IRC | 15:39 | |
notmyname | fungi: I havnet' talked to portante about it yet today, but I saw the emails | 15:39 |
fungi | likely then | 15:40 |
*** rcleere has joined #openstack-infra | 15:40 | |
*** senk has quit IRC | 15:40 | |
*** rnirmal has joined #openstack-infra | 15:41 | |
*** nsaje has quit IRC | 15:43 | |
*** yamahata_ has quit IRC | 15:43 | |
*** senk has joined #openstack-infra | 15:45 | |
*** xeyed4good has left #openstack-infra | 15:46 | |
*** CaptTofu has quit IRC | 15:46 | |
*** CaptTofu has joined #openstack-infra | 15:46 | |
portante | notmyname: long response times from the object server do not always mean that the disk is the bottleneck | 15:47 |
*** boris-42_ is now known as boris-42 | 15:47 | |
*** sarob has joined #openstack-infra | 15:47 | |
*** senk has quit IRC | 15:48 | |
*** senk has joined #openstack-infra | 15:49 | |
*** miqui has joined #openstack-infra | 15:49 | |
miqui | hello... | 15:50 |
miqui | anyone know a workaround for this: https://code.google.com/p/gerrit/issues/detail?id=1884 | 15:50 |
*** nati_ueno has quit IRC | 15:50 | |
*** odyssey4me has quit IRC | 15:50 | |
openstackgerrit | Jaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add seealso to batch_task from promoted_build. https://review.openstack.org/57473 | 15:52 |
russellb | fun stat of the day ... 127,207 reviews in the last 365 days (avg 348.5/day) across integrated/incubated projects | 15:53 |
anteaya | w00t | 15:54 |
anteaya | a year in a day | 15:54 |
anteaya | almost | 15:54 |
*** atiwari has joined #openstack-infra | 15:56 | |
*** dcramer__ has joined #openstack-infra | 15:57 | |
*** sarob has quit IRC | 15:58 | |
*** jaypipes has joined #openstack-infra | 15:58 | |
*** ilyashakhat_ has quit IRC | 15:58 | |
*** ilyashakhat has joined #openstack-infra | 15:58 | |
*** hdd has joined #openstack-infra | 15:59 | |
mordred | SergeyLukjanov: that shoudl be fine with testr as well | 15:59 |
mordred | SergeyLukjanov: are you hitting problems with it? | 15:59 |
*** jcoufal has quit IRC | 16:00 | |
*** CaptTofu has quit IRC | 16:01 | |
SergeyLukjanov | mordred, yep, there was some problems with before the summit, but now i'm hitting another error - http://paste.openstack.org/show/53684/ | 16:01 |
*** CaptTofu has joined #openstack-infra | 16:01 | |
portante | notmyname: there are other errors in those logs that require steve lang's fix at https://review.openstack.org/57019 | 16:02 |
portante | but we get that through the gate jobs. :( | 16:02 |
portante | mordred, jog0, clarkb: how can we get some commits through to help with the gate job issues? | 16:03 |
*** jcooley_ has joined #openstack-infra | 16:04 | |
mordred | SergeyLukjanov: that looks like a normal import issue - is that in trunk? or in a patch? | 16:04 |
SergeyLukjanov | mordred, here is the patch https://review.openstack.org/#/c/57477 for moving to testr from nosetests | 16:05 |
mordred | ah. neat | 16:05 |
*** UtahDave has joined #openstack-infra | 16:05 | |
*** marun has quit IRC | 16:06 | |
*** markmc has quit IRC | 16:07 | |
mordred | SergeyLukjanov: looking now | 16:07 |
SergeyLukjanov | mordred, thank you, that's very strange, tests work ok with nose | 16:08 |
*** dolphm is now known as dolphm_afk | 16:08 | |
SergeyLukjanov | mordred, I'm expect some resources-related failures | 16:08 |
fungi | portante: we have a few options, none of them ideal... two main possibilities are to force the changes in without final testing (gross) or dump the entire zuul state then reenqueue it all with your proposed fix at the front and hope it passes... or we just wait and cross out fingers (current head of the gate was approved roughly 15.75 hours ago) | 16:13 |
fungi | s/out/our/ | 16:13 |
portante | fungi: how do we determine which of the three to take? | 16:16 |
portante | I look last night the gated jobs where at 107 or so, and now at 137 | 16:16 |
portante | can I get access to a running set of systems to look at how the overall VMs (assuming we are not running directly on hardware) are behaving? | 16:18 |
portante | fungi? | 16:18 |
fungi | portante: if you follow the sparklines you'll see it got fairly low overnight. we're actually landing quite a few changes, it's just that the approval rate on changes has been relatively high coupled with current bugs in openstack making them moderately untestable | 16:18 |
fungi | sorry, i'll try to answer your questions in sequence here... | 16:18 |
fungi | so in the past, any solution involving "jumping the gate" has come down to a fairly involved discussion between infra and qa usually. we try really, really hard not to further compromise the state of openstack by doing that | 16:19 |
portante | certainly, requeuing seems like the best option | 16:20 |
mordred | SergeyLukjanov: so - what's happening | 16:20 |
fungi | in some ways, the fact that people have approved buggy changes into openstack is contributing to slowing the overall pace of openstack development, which in a one-project-bit-picture view could be thought of as a self-imposed limit on the rate of development over quality | 16:20 |
mordred | is that discover scans through the python module path | 16:20 |
mordred | which causes code in __init__.py files to get executed | 16:20 |
fungi | er, big picture | 16:20 |
portante | agreed | 16:20 |
mordred | SergeyLukjanov: in savanna/conductor/__init__.py for instance, there is code that's executed | 16:21 |
portante | that comes down to individual teams have effective unit test strategies that help lower those incidents, no? | 16:21 |
fungi | or not enough cross-team involvement in helping each other fix that situation | 16:21 |
*** alcabrera|afk is now known as alcabrera | 16:21 | |
portante | fungi: perhaps | 16:22 |
mordred | SergeyLukjanov: which it seems may expect to be executed in a particular context, that's now not there when discover is doing the test scan | 16:22 |
mordred | SergeyLukjanov: in general, executing code with side effects in __init__.py is an anti-pattern and should be avoided | 16:22 |
mordred | (for reasons such as this here) | 16:22 |
fungi | as for vm performance, are you talking about from the perspective of the slow response on swift objects issue seen, or just a general impression that slow virtual machines are making the gate slow (which for the latter i don't see any evidence to support) | 16:22 |
*** jcooley_ has quit IRC | 16:22 | |
portante | slow response on the swift objects | 16:23 |
portante | I'd like to just peek at the "VMs" and poke around | 16:23 |
fungi | portante: so, we do collect sysstat logs of the entire tempest run. those should provide some statistics on performance of various resources on the machine during the course of the test and can be correlated to the logs from the test | 16:23 |
* fungi digs up an example | 16:24 | |
portante | okay, is that in the typical logs/ directory? | 16:24 |
fungi | yes | 16:24 |
*** mgagne has joined #openstack-infra | 16:24 | |
fungi | portante: such as http://logs.openstack.org/65/51865/1/gate/gate-tempest-devstack-vm-postgres-full/da2a423/logs/sysstat.dat.gz | 16:25 |
fungi | (random example selected) | 16:25 |
*** nsaje has joined #openstack-infra | 16:25 | |
fungi | should be able to feed that into sar and examine various resources over teh course of teh test | 16:25 |
mordred | SergeyLukjanov: ah - actually | 16:25 |
mordred | SergeyLukjanov: I lied | 16:25 |
mordred | I believe it has to do with file paths | 16:26 |
mordred | SergeyLukjanov: ok. I'm just lying a lot | 16:26 |
*** DinaBelova has quit IRC | 16:26 | |
fungi | portante: as far as getting access to virtual machines, if you decide the only efficient way is to ssh into one of the machines where a test ran into this particular issue, our best bet is to proactively mark a bunch of machines in a held state (enough to be statistically likely that one will encounter that bug) so that they won't automatically be garbage collected, and then hope we catch one | 16:27 |
fungi | but doing so reduces the overall available pool in our aggregate quota, and so reduces test velocity for other changes even further, so it's also not without a downside | 16:28 |
fungi | if you simply want a vm configured similarly to how we run tests, i wrote https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/README.rst#n100 to help developers recreate the conditions under which we run devstack/tempest like jobs | 16:30 |
portante | fungi: cool | 16:30 |
portante | what is the format of sysstat.dat? | 16:30 |
portante | sar output? | 16:30 |
*** dolphm_afk is now known as dolphm_ | 16:31 | |
*** dolphm_ is now known as dolphm | 16:31 | |
fungi | yes | 16:31 |
portante | k | 16:31 |
*** Hefeweizen has joined #openstack-infra | 16:31 | |
mordred | SergeyLukjanov: INFO: Configuration file "itest.conf" not found * | 16:32 |
*** jcoufal has joined #openstack-infra | 16:33 | |
*** ^d has joined #openstack-infra | 16:33 | |
fungi | portante: if you look in devstack's stack.sh, you'll see that enabling "sysstat" as a devstack service runs sar -o $SCREEN_LOGDIR/$SYSSTAT_FILE $SYSSTAT_INTERVAL | 16:34 |
fungi | so that file is the result, which we collect at the end of the job | 16:34 |
mordred | SergeyLukjanov: ugh. I'm not sure. the things that I thought were the cause are not the cause | 16:34 |
mordred | SergeyLukjanov: but all of those errors are errors it's finding while trying to run discover | 16:35 |
portante | fungi: got it | 16:35 |
mordred | SergeyLukjanov: if you want to poke at it more, you can run | 16:36 |
mordred | SergeyLukjanov: python -m testtools.run discover savanna | less | 16:36 |
mordred | SergeyLukjanov: which will run things outside of the context of testr (you'll want to be in a venv of course) | 16:36 |
fungi | miqui: i haven't seen that issue before. were you encountering it on review.openstack.org or elsewhere? the bug reports linked suggest that it may have been a problem in 2.3 and possibly 2.5 but not 2.4 (which is what we currently run) | 16:37 |
mordred | SergeyLukjanov: oh - it is running tests - so I'm guessing perhaps something isnt' cleaning up and making it hard for something else to importa | 16:39 |
mordred | SergeyLukjanov: it's possible you shoudl not listen to me | 16:39 |
portante | fungi: can we get the sar data to collect using "-S DISK" and "-S XDISK"? | 16:41 |
portante | there does not appear to be any data on disk behavior | 16:41 |
fungi | portante: good question... sdague/dtroyer: do you object to adding those? | 16:42 |
SergeyLukjanov | mordred, sorry, was afk for last 15 mins | 16:43 |
*** senk has quit IRC | 16:43 | |
annegentle | markmc or others: do you know of auto-backport scripts for cherry-picking patches? | 16:44 |
dtroyer | fungi: I don't think that would be a problem off hand... | 16:44 |
SergeyLukjanov | mordred, got the idea, we'll try to debug it, thank you very much! sorry for afk | 16:45 |
*** svarnau has joined #openstack-infra | 16:46 | |
*** nsaje has quit IRC | 16:47 | |
*** sarob has joined #openstack-infra | 16:47 | |
*** nsaje has joined #openstack-infra | 16:47 | |
*** ^d is now known as ^demon|sick | 16:48 | |
fungi | portante: i guess try running devstack with the "sysstat" devstack service enabled but patch that line to add those options and see if it generates the output you expect. if so, the patch to devstack should be exceedingly simple | 16:48 |
*** afazekas has quit IRC | 16:48 | |
mordred | SergeyLukjanov: I'm not sure that side effect is the problem | 16:50 |
mordred | SergeyLukjanov: I tried fixing it locally and it did not fix the problem | 16:50 |
mordred | but it's _something_ having to do with imports | 16:50 |
*** jcoufal has quit IRC | 16:51 | |
portante | fungi: have you noticed that the number of tcp sockets in use starts at 5 and ramps up to 105 and never returns? | 16:51 |
SergeyLukjanov | mordred, ok, maybe there's some other side effect | 16:51 |
portante | is everything supposed to be shutdown when sar stops collecting? | 16:52 |
*** sarob has quit IRC | 16:52 | |
*** mrodden has quit IRC | 16:52 | |
*** nsaje has quit IRC | 16:52 | |
*** jcooley_ has joined #openstack-infra | 16:53 | |
fungi | portante: i'm not sure. the #openstack-qa channel is probably a better place to dig into details like that | 16:54 |
*** jcoufal-mob has joined #openstack-infra | 16:54 | |
portante | k | 16:55 |
*** sparkycollier has joined #openstack-infra | 16:58 | |
*** pcrews has quit IRC | 16:59 | |
*** gyee has joined #openstack-infra | 17:00 | |
*** mihgen has joined #openstack-infra | 17:01 | |
*** mrodden has joined #openstack-infra | 17:04 | |
*** jpich has quit IRC | 17:05 | |
*** dcramer__ has quit IRC | 17:05 | |
*** mihgen has quit IRC | 17:05 | |
*** ftcjeff has joined #openstack-infra | 17:08 | |
*** jcoufal-mob has quit IRC | 17:09 | |
*** jcooley_ has quit IRC | 17:10 | |
*** bpokorny has joined #openstack-infra | 17:10 | |
*** mihgen has joined #openstack-infra | 17:12 | |
mkoderer | does somebody know if there are special privileges needed to host a meeting in #openstack-meeting? or does meetbot accept everybody? | 17:13 |
*** CaptTofu has quit IRC | 17:15 | |
fungi | mkoderer: meetbot is an outgoing chap and will be anyone's friend. no privs needed | 17:16 |
*** CaptTofu has joined #openstack-infra | 17:17 | |
mkoderer | fungi: cool thx | 17:17 |
fungi | we also recently added a feature allowing anyone (not just the chairperson) to #endmeeting an hour or more after a #startmeeting | 17:18 |
fungi | since chairs were sometimes forgetting to do it | 17:19 |
*** dcramer__ has joined #openstack-infra | 17:19 | |
*** yaguang has quit IRC | 17:19 | |
zul | hey is there something going on with the gates? there seems to be some python-keystoneclient stuff that has been approved but havent gone in yet | 17:20 |
portante | zul: join the club | 17:20 |
portante | ;) | 17:20 |
*** dkliban_ has quit IRC | 17:22 | |
*** dcramer__ has quit IRC | 17:25 | |
*** hashar has quit IRC | 17:25 | |
*** nsaje has joined #openstack-infra | 17:25 | |
*** derekh has quit IRC | 17:27 | |
anteaya | zul | 17:27 |
anteaya | yes, a bad start to the week and many gate bugs | 17:27 |
zul | anteaya: hi | 17:27 |
zul | ok cool | 17:27 |
anteaya | http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html | 17:27 |
anteaya | zul while I have you here can we address novnc pulling in nova packages on 12.04? | 17:28 |
anteaya | it is interfering with many a devstack install | 17:28 |
zul | anteaya: sure open up a bug and i have a look | 17:28 |
sdague | mkoderer: no special privs | 17:28 |
anteaya | zul: https://bugs.launchpad.net/devstack/+bug/1248923 | 17:29 |
uvirtbot | Launchpad bug 1248923 in devstack "Devstack install is failing:" [Undecided,Confirmed] | 17:29 |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce import grouping https://review.openstack.org/52221 | 17:29 |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce grouping like imports together https://review.openstack.org/54402 | 17:29 |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering https://review.openstack.org/54403 | 17:29 |
anteaya | seems to be a packaging issue, zul, what do you think? | 17:29 |
zul | anteaya: probably ill have a look | 17:30 |
anteaya | thanks | 17:30 |
*** Bada has joined #openstack-infra | 17:32 | |
*** dkliban_ has joined #openstack-infra | 17:34 | |
*** sileht is now known as sileht_ | 17:40 | |
*** sileht_ is now known as sileht | 17:40 | |
*** chandankumar has quit IRC | 17:40 | |
*** boris-42 has quit IRC | 17:42 | |
*** jamesmcarthur has quit IRC | 17:44 | |
*** afazekas has joined #openstack-infra | 17:45 | |
*** reed_ has joined #openstack-infra | 17:45 | |
*** jamesmcarthur has joined #openstack-infra | 17:47 | |
*** SergeyLukjanov has quit IRC | 17:48 | |
*** salv-orlando has quit IRC | 17:49 | |
clarkb | morning | 17:49 |
* clarkb catches up on sb | 17:49 | |
*** senk has joined #openstack-infra | 17:52 | |
*** salv-orlando has joined #openstack-infra | 17:54 | |
anteaya | morning clarkb | 17:55 |
*** sarob has joined #openstack-infra | 17:57 | |
portante | fungi: so what is up with the gates, all the jobs are queued, nothing running, it appears? | 17:57 |
clarkb | portante: I think we are out of available test nodes | 17:58 |
portante | oy | 17:58 |
clarkb | the jobs that are running are using all available resources | 17:58 |
portante | so the check jobs are interferring with the gate jobs? | 17:58 |
fungi | portante: up in the top-left you'll see event and result totals | 17:58 |
*** pcrews has joined #openstack-infra | 17:58 | |
fungi | immediately following a gate reset, zuul blocks to process the events/results to determine what to do next | 17:59 |
fungi | the more changes impacted by a gate reset, the higher the total number of factors it ends up taking into account, and the longer that takes to happen | 17:59 |
*** ruhe has joined #openstack-infra | 17:59 | |
portante | okay | 17:59 |
*** sarob has quit IRC | 18:00 | |
fungi | but in addition, as clarkb points out, right now we have more pending jobs than we have nodes, so a gate reset effectively depletes the entire pool and has to wait for new nodes to be added since we're maxxed out on our quotas with our providers right now | 18:00 |
*** sarob has joined #openstack-infra | 18:00 | |
fungi | looking at the graph in the bottom-left, you'll see the swing between used and deleting which accompanies each agte reset | 18:00 |
clarkb | I think there may be a slight compounding problem where our 2.5 jenkins masters can't deal with the amount of load being thrown at them | 18:01 |
clarkb | hence the long delay BobBall saw earlier | 18:01 |
fungi | entirely possible. we've been driving the jenkins masters like sled dogs | 18:02 |
clarkb | jog0: also grenade smoke tests run in parallel now? I think we may be seeing some failures there related to parallel testing | 18:02 |
clarkb | tl;dr ugh | 18:02 |
SpamapS | holy overwhelming QA fail batman.. http://status.openstack.org/rechecks/ is overrun | 18:02 |
*** harlowja has joined #openstack-infra | 18:02 | |
*** metabro has quit IRC | 18:03 | |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce import grouping https://review.openstack.org/52221 | 18:03 |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce grouping like imports together https://review.openstack.org/54402 | 18:03 |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering https://review.openstack.org/54403 | 18:03 |
clarkb | we should just rebase everything to havana and start over >_> | 18:03 |
clarkb | (havana works :) ) | 18:04 |
*** ruhe has quit IRC | 18:04 | |
*** metabro has joined #openstack-infra | 18:04 | |
fungi | clarkb: on a positive note, we got confirmation that those ghost "running" jobs zuul sees in the wake of the jenkins01 situation can effectively be cleared with a new patchset | 18:04 |
fungi | so i'm less worried about finding an opportunity to restart it | 18:04 |
clarkb | cool | 18:04 |
*** ilyashakhat has quit IRC | 18:05 | |
*** sarob has quit IRC | 18:05 | |
*** ilyashakhat has joined #openstack-infra | 18:05 | |
clarkb | SpamapS: if you want the bug that is currently most troublesome 1251920 seems to be the ticket (I know that number off the top of my head now) | 18:05 |
fungi | i'm working on adding more space to static.o.o now but i worry that we may max out the kernel limit on how many scsi block devices are allowed | 18:06 |
*** johnthetubaguy has joined #openstack-infra | 18:06 | |
*** johnthetubaguy1 has quit IRC | 18:06 | |
clarkb | silly kernel limits | 18:07 |
*** jerryz has joined #openstack-infra | 18:07 | |
fungi | we'll have something like 20 cinder volumes of 0.5tb each attached to it when i'm done | 18:08 |
*** dcramer_ has joined #openstack-infra | 18:09 | |
hub_cap | mordred: we want to remove our guest agent from our main repo and into its own. is that something that is ok to do w/o some sort of formal approval? | 18:09 |
mordred | hub_cap: I think it's a great idea - and it's all inside of your program, so I dont think it's a problem | 18:10 |
hub_cap | k wonderful | 18:10 |
* clarkb is going to be annoying. we should fix manage-projects before adding new projects | 18:10 | |
fungi | hub_cap: however, you probably want to explore git filter-branch (so you can preserve its revision control history) | 18:10 |
SpamapS | clarkb: yeah I've done several rechecks and reverifies for 151920 | 18:10 |
hub_cap | fungi: roger | 18:11 |
SpamapS | 1251920 | 18:11 |
clarkb | or at least everyone should know that after rerunning manage-projects by hand successfully you probably need to reclone the repo for zuul | 18:11 |
*** sdake_ has joined #openstack-infra | 18:11 | |
*** sdake_ has quit IRC | 18:11 | |
*** sdake_ has joined #openstack-infra | 18:11 | |
mordred | clarkb: what's broke with it? | 18:11 |
mordred | is this the race-condition thing? | 18:11 |
clarkb | mordred: first run of manage-projects creates the project in gerrit but gives it no content because github (or some other failure) then zuul clones the empty repo | 18:11 |
*** sarob has joined #openstack-infra | 18:11 | |
mordred | clarkb: awesome | 18:11 |
clarkb | now zuul is stuck with an empty repo and when content arrives zuul can't resolve the mismatch | 18:11 |
clarkb | so you have to move the old repo zuul cloned aside then reclone it for zuul (because zuul thinks it is cloned and won't reclone itself) | 18:12 |
clarkb | if you look in the zuul users bash history there are some examples of recloning (you have to do it with the special GIT_SSH script) | 18:13 |
*** sparkycollier has quit IRC | 18:14 | |
*** dizquierdo has quit IRC | 18:14 | |
*** sparkycollier has joined #openstack-infra | 18:15 | |
clarkb | mordred: I think we just need to make sure that failures in manage-projects are more isolated from each other, eg github derping shouldn't prevent us from seeding content in gerrit | 18:16 |
mordred | clarkb: ++ | 18:16 |
mordred | I agree. I think that's a great idea | 18:16 |
*** sparkycollier has quit IRC | 18:17 | |
*** johnthetubaguy has quit IRC | 18:18 | |
*** ben_duyujie1 has quit IRC | 18:18 | |
*** marun has joined #openstack-infra | 18:19 | |
fungi | we may also want to consider making manage-projects slightly more stateful and not rely on it to do things like check to make sure every single project we have is configured in github or perhaps only run specifically for newly aded/changed projects | 18:19 |
*** osanchez has quit IRC | 18:20 | |
fungi | crap. as i feared, nova volume-attach is not giving me devnodes beyond /dev/xvdq (just seems to keep reusing that one once i got to it) | 18:20 |
clarkb | :/ | 18:20 |
fungi | we may need to take some time to individually pvremove 0.5tb volumes and add 1tb volumes in their place | 18:21 |
clarkb | and then really start looking at swift again | 18:22 |
fungi | dmesg on static.o.o makes no mention of any new block devices getting hotadded after xvdp | 18:22 |
clarkb | fungi: you can easily swap out the 0.5TB volumes that you just added for 1TB volumes right? | 18:25 |
clarkb | and we can then worry about the others later? | 18:25 |
fungi | clarkb: yeah, that'll be a start, but it's not enough to get me to the total i was shooting for | 18:25 |
*** melwitt has joined #openstack-infra | 18:26 | |
fungi | however i can pvmove parts of the main vg to them, freeing up a few additional 0.5tb blockdevs which i can then vgreduce off of, pvremove, cinder detach, cinder delete and replace with more 1tb volumes | 18:27 |
*** hogepodge has joined #openstack-infra | 18:27 | |
*** ilyashakhat_ has joined #openstack-infra | 18:27 | |
fungi | but that's probably best left for a weekend when there's a lot less chance of breaking log uploads and causing jobs to fail | 18:28 |
*** ilyashakhat has quit IRC | 18:28 | |
lifeless | annegentle: I am! | 18:31 |
*** sarob has quit IRC | 18:31 | |
lifeless | annegentle: thats why we had dinner together in HK :) | 18:31 |
*** sarob has joined #openstack-infra | 18:32 | |
*** nsaje has quit IRC | 18:32 | |
*** MarkAtwood has joined #openstack-infra | 18:32 | |
*** nsaje has joined #openstack-infra | 18:33 | |
*** nsaje has quit IRC | 18:33 | |
*** jamesmcarthur has quit IRC | 18:33 | |
*** nsaje has joined #openstack-infra | 18:33 | |
jeblair | fungi: what limit are we hitting? | 18:34 |
*** jamesmcarthur has joined #openstack-infra | 18:35 | |
*** Bada has quit IRC | 18:35 | |
jeblair | clarkb: afaik we're planning on using swift, not "looking at it" | 18:35 |
jeblair | clarkb: at least, that's what i got out of that design summit session | 18:36 |
fungi | jeblair: good question. i don't get an error message anywhere obvious, but the kernel stops registering new xvd's after the 16th one it has (i think that's a kernel limit, maybe tunable via sysctl--checking now) | 18:36 |
*** sarob has quit IRC | 18:37 | |
clarkb | jeblair: right | 18:37 |
*** dcramer_ has quit IRC | 18:37 | |
jog0 | clarkb: ohh that makes sense we went parallel for grenade because we thought that was a better solution that bumping the timeout on the job because it was getting too long on RAX | 18:37 |
jog0 | clarkb: and e-r doesn't cover grenade | 18:38 |
jog0 | sdague: sorry ^ | 18:38 |
jeblair | fungi: hrm, that seems very low | 18:38 |
sdague | jog0: correct, e-r doesn't cover grenade | 18:41 |
sdague | some refactoring is needed for that | 18:41 |
mordred | clarkb, jeblair, fungi: hp cloud is having some capacity issues they're asking for some help with from us | 18:42 |
mordred | specifically, az2 has way more capacity right now, so I suggested that we request a doubling of our quota in az2, and the move half of our usage out of each of az1 and az3 to az2 | 18:42 |
mordred | they said that woudl be very helpful - any issues from you guys on moving forward on that? | 18:42 |
fungi | mordred: that sounds sane to me | 18:43 |
jeblair | mordred: ++ | 18:43 |
jog0 | sdague: what do you think the right step forward for fixing grenade is? | 18:44 |
jog0 | we can just look at the bug inside of nova we are hitting | 18:44 |
* clarkb will fire off an email right now | 18:44 | |
clarkb | re quota bump in az2 | 18:44 |
fungi | as for the 16 block device limit, i'm finding that at various points in time that was a per-domu limit in both xen and libvirt (thinking both are resolved now but still digging for confirmation), but also at one point linux lvm2 allowed no more than 16 pv components in a vg so need to figure out whether that's still the case as well | 18:45 |
sdague | jog0: honestly, I haven't looked at it yet | 18:47 |
* sdague still working through some onboarding tasks | 18:47 | |
jog0 | sdague: it took me months to onboard | 18:48 |
jog0 | sdague: no problem, short term I see two options: revert and bump timeout or fix nova bug | 18:49 |
jog0 | I am in favor of fixing nova instead | 18:49 |
sdague | jog0: so is the issue that parallel grenade broke the world? | 18:49 |
clarkb | no | 18:49 |
jog0 | sdague:just a tiny part | 18:49 |
clarkb | parallel grenade is a small problem compared to the other issues | 18:49 |
sdague | ok | 18:49 |
sdague | the timeout bump is fine, but I'm not convinced it will fix it | 18:50 |
jog0 | sdague: timeout bump + serial | 18:50 |
sdague | jenkins load means it's taking 30 minutes to even have a node ready | 18:50 |
*** ilyashakhat_ has quit IRC | 18:50 | |
jeblair | sdague: can you clarify? | 18:51 |
*** ilyashakhat has joined #openstack-infra | 18:51 | |
*** sandywalsh has quit IRC | 18:52 | |
sdague | so there was a grenade fail that BobBall posted earlier | 18:52 |
mgagne | zaro: ping | 18:53 |
*** jamesmcarthur has quit IRC | 18:53 | |
sdague | http://logs.openstack.org/31/57431/2/check/check-grenade-devstack-vm/742d85e/ | 18:53 |
sdague | jeblair: look at the timestamps from job kick off, until it actually does anything real | 18:54 |
sdague | http://logs.openstack.org/31/57431/2/check/check-grenade-devstack-vm/742d85e/console.html#_2013-11-20_15_49_44_264 | 18:54 |
jeblair | sdague: gotcha | 18:55 |
sdague | the nex line is 49 minutes later | 18:55 |
sdague | so yes.... that would cause a timeout issue :) | 18:55 |
jog0 | wait grenade is still timing out? | 18:55 |
jog0 | even with parallel tests? | 18:55 |
sdague | yes | 18:55 |
sdague | because the tests aren't the problem | 18:55 |
jog0 | I was refering to a differnet bug where some of the tests failed | 18:55 |
jog0 | sdague: ohh | 18:56 |
sdague | it's taking 30 - 50 minutes before devstack even starts executing | 18:56 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Create jenkins03 and jenkins04 https://review.openstack.org/57510 | 18:56 |
sdague | which, based on past conversations means jenkins load is slowing things down | 18:57 |
sdague | I think | 18:57 |
jeblair | sdague: hrm, no actually... | 18:57 |
fungi | something between when gate-wrap.sh started executing and when grenade started to run | 18:57 |
jog0 | sdague: its getting old | 18:57 |
jeblair | sdague: what fungi said | 18:57 |
sdague | jeblair: ok, correct me if I'm wrong | 18:57 |
jog0 | soon it will need ja jenkins walker | 18:57 |
sdague | oh, right | 18:57 |
fungi | we don't have timestamps in any of the setup logs though, so hard to tell | 18:58 |
sdague | yeh, I just noticed that | 18:58 |
zaro | mgagne: yo! | 18:59 |
mgagne | zaro: bug #1253180 Would assigning an empty dict to jobparams if value is None be an acceptable solution? Or should we wrap d.update(...) calls with an if jobparams: instead? | 18:59 |
uvirtbot | Launchpad bug 1253180 in openstack-ci "jenkins-job-builder exception on valid yaml" [Undecided,New] https://launchpad.net/bugs/1253180 | 18:59 |
*** sarob has joined #openstack-infra | 19:00 | |
* zaro reads bug | 19:00 | |
mgagne | zaro: jenkins_jobs/builder.py:148 | 19:02 |
zaro | mgagne: bug is a little confusing. that yaml is *not* valid, correct? | 19:02 |
mgagne | zaro: jobparams is None if a colon is introduced after the job name as shown in the example. (and without params). Later, this value is used to update a dict. .update() expects the value to be iterable, not None. | 19:03 |
mgagne | zaro: I took his words when he said it was valid. I'm checking atm | 19:03 |
*** alcabrera is now known as alcabrera|afk | 19:04 | |
zaro | mgagne: i'm pretty sure that's invalid. because it's a key value pair, but no value. | 19:05 |
*** rnirmal has quit IRC | 19:06 | |
clarkb | mordred: jeblair: fungi: quota bump request sent | 19:06 |
zaro | mgagne: but in any case, a better error message would be nice. | 19:06 |
*** sandywalsh has joined #openstack-infra | 19:08 | |
jog0 | so we think* we have a fix for the big bug (bug 1251920) | 19:10 |
uvirtbot | Launchpad bug 1251920 in nova "Tempest failures due to failure to return console logs from an instance" [Critical,In progress] https://launchpad.net/bugs/1251920 | 19:10 |
jog0 | https://review.openstack.org/57509 | 19:10 |
clarkb | jog0: if oslo hadn't been synced in 4 months why is havana fine? | 19:10 |
jog0 | can we call on any infra magic to get that bumped to the top of the queue | 19:10 |
jog0 | clarkb: see lifeless's email on the thread | 19:11 |
lifeless | clarkb: it may not be. | 19:11 |
clarkb | jog0: (just trying to understand why we think this will fix the problem) and why did it start on the 15th? | 19:11 |
lifeless | clarkb: I'm actually fairly worried about H right now. | 19:11 |
jog0 | clarkb: and TBH I am confused about this one | 19:11 |
clarkb | lifeless: k | 19:11 |
* clarkb reads email | 19:11 | |
mgagne | zaro: I think it is valid. | 19:11 |
jog0 | because the patch that we think broke things was for nova neutron. but the bug wasn't | 19:11 |
clarkb | fungi: jeblair mordred basically I think the question is do we stop zuul now, maybe ask everyone to stop approving things and or push patches (ha) get some of these bug related changes in then open the floodgates | 19:12 |
jog0 | although worste case is we are wrong and this fixes something else | 19:12 |
jog0 | I think | 19:12 |
clarkb | at this point I am game for trying it since we aren't really keeping up and gate resets are killing us | 19:13 |
clarkb | jog0: can you get a list of changes together so that we know which ones should be high priority? thinking of the swift timeout change and any nova changes | 19:13 |
sdague | once the check queue is back on that oslo change, we could also just ninja merge it | 19:13 |
clarkb | sdague: we could do that as well | 19:14 |
jog0 | clarkb: oh right | 19:14 |
zaro | mgagne: i think it's parser dependend. how are you validating? | 19:14 |
sdague | honestly, until we've got a +2 initiated ninja merge process, I'd just ninja merge things with good check results that we think fix the world | 19:14 |
clarkb | sdague: it leaves us open to small races that can make things worse but that seems like a low risk | 19:15 |
clarkb | (if eg something does get through the gate somehow :) and interferes with the forced in change) | 19:15 |
jog0 | clarkb: https://review.openstack.org/#/c/57373/ | 19:16 |
jog0 | let me rebase taht one | 19:16 |
jog0 | there were valid -1s on it | 19:16 |
mgagne | zaro: my guts is telling me it is. Trying to find references for that one. In key: value, if value is empty, it is null (None). | 19:16 |
jog0 | I htink those are the only two I know of | 19:17 |
jog0 | andthe swift one is *way* lower priority | 19:17 |
clarkb | jog0: ok | 19:17 |
zaro | mgagne: ok. i've verified with python yaml. it does look ok. no value is interpreted as null | 19:18 |
mgagne | zaro: either the job name is a scalar (w/o job params) or it's a mapping with a single key, the key being the name and the value, the job params, hopefully a mapping, not a scalar as we are expecting a mapping. | 19:18 |
jog0 | clarkb: so ignoer the swift one | 19:18 |
jog0 | that can wait | 19:18 |
*** alcabrera|afk is now known as alcabrera | 19:19 | |
jeblair | sdague, jog0, clarkb: if there are patches that we believe will fix things, i think we should declare queue bankruptcy, stop zuul, and then reverify those patches. | 19:20 |
*** ilyashakhat has quit IRC | 19:20 | |
*** roaet has left #openstack-infra | 19:20 | |
fungi | yeah, at the moment the head of the gate is changes which were approved 18 hours ago. when i started this morning it was only 14 hours... it looked like it had managed to work through quite a lot of things and gain ground while we were asleep. i think around 0800 utc or so the gate was only about 50 changes deep according to graphs | 19:20 |
jeblair | i also think people should stop approving things until the project is at least kinda working again... | 19:21 |
zaro | mgagne: i assume that if it's valid it should be accepted. | 19:21 |
*** ilyashakhat has joined #openstack-infra | 19:21 | |
jog0 | jeblair: if lifeless is right that one patch will fix most things | 19:21 |
clarkb | jeblair: ++ | 19:21 |
jeblair | so in concert with that, i'd send a msg to the list saying we have intentionally dropped the queue of approved patches, please reapprove only ones that fix known problems | 19:21 |
portante | jeblair, can we disable the starting gate jobs thing off of approvale? | 19:21 |
portante | make it manual for now? | 19:21 |
jeblair | portante: approvals are the manual starting of gate jobs | 19:22 |
jog0 | portante: ca nyou look at a https://review.openstack.org/#/c/57373/2 | 19:22 |
jog0 | sdague: ^ | 19:22 |
jog0 | if you both sign off we can push that to the top of the queue too | 19:22 |
clarkb | jeblair: any opinions on who should be doing what? | 19:22 |
jeblair | clarkb: do we have a (set of) patch(es) ready for this? | 19:23 |
zaro | mgagne: so i guess jjb should not even throw an exception at all. | 19:23 |
clarkb | I am happy to do the zuul stop start or collect the current list of changes so that we can swing around later and reverify/recheck | 19:23 |
clarkb | jeblair: jog0 indicates that the only one we should worry about is the nova change https://review.openstack.org/#/c/57509/2 | 19:23 |
clarkb | jog0: we might as well get the swift change in too if possible | 19:24 |
mgagne | zaro: we should handle this use case. | 19:24 |
jeblair | jog0: (i think the standard is 'Co-Authored-By', btw) | 19:24 |
clarkb | so maybe we need to sync up with nova cores to make sure they are happy getting that code in asap? | 19:24 |
portante | jog0: looks fine, I wrote this up to add a bit more detail to help others less familiar: http://paste.openstack.org/show/53694/ | 19:24 |
jog0 | jeblair: thats what the other co-author in devstack said | 19:24 |
mgagne | zaro: I see 2 possible solutions: Assigning an empty dict to jobparams if value is None Or wrap d.update(...) calls with an if jobparams: | 19:24 |
jog0 | (co-auth-by) | 19:25 |
*** markwash has joined #openstack-infra | 19:25 | |
jeblair | jog0: you said '-with' in that swift commit | 19:25 |
jog0 | jeblair: thats what I ment | 19:25 |
jog0 | meant | 19:25 |
jeblair | jog0: it's wrong :) | 19:25 |
jog0 | jeblair: see I0652f639673e600fd7508a9869ec85f8d5ce4518 | 19:25 |
jog0 | and blame sdague | 19:26 |
portante | jeblair: while approvals might be the manual way, there are those that will still be approving requests that are not necessary to get the gate jobs working | 19:26 |
* zaro pulls jjb master | 19:26 | |
jog0 | so bug fix patches: https://review.openstack.org/57509 https://review.openstack.org/#/c/57373 | 19:26 |
fungi | jog0: see https://wiki.openstack.org/wiki/GitCommitMessages#Including_external_references ;) | 19:26 |
clarkb | woot we have more quota, I will propose a nodepool config change to rebalance our node distribution | 19:27 |
jeblair | jog0: i'm telling you that commit is wrong. that's really really wrong, in fact. that's monty saying that he co-authored a patch with himself. which is ridiculous. | 19:27 |
jeblair | jog0: what does sdague have to do with it? | 19:27 |
jog0 | jeblair: https://review.openstack.org/#/c/35705/ is that patch I took the format from | 19:28 |
jeblair | clarkb: so, step 1 is get 57509 APRV+1, then let's save the queue, stop zuul, and reverify/reapprove that one | 19:28 |
jog0 | jeblair: I will fix | 19:28 |
clarkb | jeblair: sounds good | 19:28 |
jeblair | jog0: i know. you said that already. it's wrong. :) nothing i can do about that other than tell you it's wrong. | 19:29 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Shift more node use to HPCloud AZ2. https://review.openstack.org/57515 | 19:29 |
jog0 | jeblair: its not wrong anymore | 19:29 |
jog0 | fixed | 19:29 |
clarkb | jeblair: fungi mordred ^ rebalance the hpcloud nodes | 19:30 |
*** ilyashakhat has quit IRC | 19:30 | |
lifeless | jog0: it fixes two of the bugs - see comments in the review. | 19:31 |
fungi | clarkb: plus 18 extra nodes? | 19:31 |
*** hashar has joined #openstack-infra | 19:31 | |
clarkb | fungi: 12 | 19:31 |
*** ilyashakhat has joined #openstack-infra | 19:31 | |
*** melwitt has quit IRC | 19:31 | |
*** melwitt1 has joined #openstack-infra | 19:32 | |
clarkb | fungi: the limit is 96*3 (or was) I am bumping az2 to 192 and cutting the theoretical limit in half not the 90 in half | 19:32 |
clarkb | fungi: I can s/48/45/ though | 19:32 |
zaro | mgagne: imo the wrap d.update() would be better. | 19:32 |
jog0 | lifeless: holy crap sweet | 19:32 |
fungi | clarkb: cool. just pointing out that 192+48*2-90*3=18 | 19:33 |
*** mihgen has quit IRC | 19:33 | |
jeblair | clarkb: ah yeah, that was to give a small buffer for node leakage; we'll get more quota errors in the log without it. either way. | 19:33 |
clarkb | jeblair: the new limit in AZ2 is above 192 so we should be fine | 19:33 |
fungi | wfm | 19:34 |
*** danger_fo_away is now known as dangers | 19:35 | |
jeblair | clarkb, sdague, jog0: are we confident enough in that fix to go ahead and restore the gate queue after it? | 19:35 |
clarkb | jeblair: I have asked in #openstack-nova for people to look at it | 19:35 |
clarkb | jog0 is rallying the troops there | 19:36 |
jog0 | clarkb: I thnkk they are all sleeping | 19:36 |
jog0 | including sdague | 19:36 |
jeblair | that sounds like a really good idea right now :) | 19:37 |
sdague | jog0: so there is the little fix (the 7 line tls one) and the big sync | 19:38 |
sdague | which one are you talking about? | 19:38 |
jog0 | sdague: the little one | 19:38 |
jog0 | https://review.openstack.org/#/c/57509/ | 19:38 |
sdague | I want test results back on it before I +2 it | 19:38 |
jog0 | sdague: can you A+ ASAP | 19:38 |
jog0 | we have results | 19:38 |
*** sarob has quit IRC | 19:38 | |
jog0 | see peter'scomments | 19:38 |
sdague | we don't have jenkins results | 19:39 |
jog0 | sdague: and AFAIK we will still gate on it | 19:39 |
jog0 | clarkb: ^ | 19:39 |
*** sarob has joined #openstack-infra | 19:39 | |
jog0 | we won't because jenkins is backed up | 19:39 |
clarkb | jog0: we will still gate on it, it will go to the head of the queue | 19:39 |
jog0 | sdague: ^ | 19:39 |
jog0 | right | 19:39 |
clarkb | and get tested ahead of everything else | 19:39 |
jog0 | otherwise we wait hours | 19:39 |
*** ^demon|sick has quit IRC | 19:39 | |
clarkb | we just got approval | 19:39 |
fungi | and hopefully not hit a nondeterministic error | 19:39 |
fungi | and get kicked back out | 19:40 |
clarkb | jeblair: fungi: ready? | 19:40 |
jog0 | fungi: heh yeah | 19:40 |
jeblair | clarkb: i'm saving queues now | 19:40 |
*** markmc has joined #openstack-infra | 19:40 | |
*** ^demon|sick has joined #openstack-infra | 19:40 | |
sdague | ok, sorry, was confused on the ordering | 19:40 |
*** nsaje has quit IRC | 19:41 | |
*** blamar has quit IRC | 19:41 | |
jeblair | clarkb: i have saved the check and gate queues | 19:41 |
clarkb | jeblair: should I stop then start zuul now? | 19:41 |
jeblair | clarkb: go for it | 19:41 |
clarkb | we may also need to manually cleanup some jobs in jenkins afterwards to free up slaves | 19:41 |
clarkb | stopping zuul now | 19:41 |
jeblair | clarkb: it would be easiest to do that while zuul is stopped | 19:42 |
jeblair | clarkb: so why don't you delay between stopping and starting | 19:42 |
jeblair | clarkb: and we'll go kill jobs | 19:42 |
fungi | agreed. jumping in the jenkins01/02 webuis now | 19:42 |
clarkb | jeblair: oh too late | 19:42 |
jeblair | clarkb: meh. just stop it again. :) | 19:42 |
clarkb | I can stop zuul again really quickly | 19:42 |
clarkb | done | 19:42 |
* clarkb starts killing jobs on jenkins02 | 19:43 | |
*** hogepodge has quit IRC | 19:43 | |
fungi | i'll start with jenkins01 jobs then | 19:43 |
jeblair | i will too. shouldn't hurt if we double-kill things. | 19:44 |
*** blamar has joined #openstack-infra | 19:44 | |
*** sarob has quit IRC | 19:44 | |
* fungi is working from the bottom on 01 | 19:44 | |
jeblair | i'll work from the top | 19:45 |
* jog0 owes everyone here a beer | 19:45 | |
*** hogepodge has joined #openstack-infra | 19:46 | |
fungi | jenkins01 is clean | 19:47 |
fungi | pitching in on 02 now | 19:47 |
clarkb | there are a couple stubbon jobs on 02 near the top I may just ignore them for now | 19:49 |
jeblair | i've got them all open in tabs | 19:49 |
jeblair | so i can continue to try to kill them or at least track them | 19:49 |
jeblair | clarkb: why don't you start zuul now | 19:49 |
clarkb | jeblair: doing that now | 19:50 |
clarkb | zuul is starting | 19:50 |
fungi | i am going to guess that 02 is being more of a pain because 01 was restarted only a couple days ago | 19:50 |
clarkb | ready to reverify the nova change? | 19:51 |
*** Ryan_Lane has joined #openstack-infra | 19:51 | |
jeblair | clarkb: done | 19:51 |
clarkb | we should recheck the full nova sync as well | 19:51 |
jeblair | the jobs on 02 were all started ~7 hours 50 mins ago | 19:51 |
* clarkb rechecks the full nova sync | 19:51 | |
*** dstanek has quit IRC | 19:52 | |
clarkb | ok those jobs have started, any other changes we want to get in asap? maybe the az2 rebalance config change? | 19:52 |
jog0 | clarkb: and the swift patch if you want | 19:52 |
jog0 | https://review.openstack.org/#/c/57373/ | 19:52 |
jeblair | clarkb: go for it | 19:52 |
jeblair | jog0: not approved yet | 19:52 |
jog0 | sdague: ^ | 19:52 |
clarkb | approved the nodepool config change | 19:52 |
jog0 | jeblair: I ment for the check queue | 19:52 |
jeblair | jog0: ah sure | 19:53 |
sdague | approved now | 19:53 |
jog0 | sdague: thanks | 19:53 |
fungi | looks like precise3 and precise23 are dead. i'll see what i can do to get them back on line | 19:53 |
jog0 | portante: ^ | 19:53 |
jeblair | i'll clean up those stuck nodes on jenkins02 | 19:54 |
*** dstanek has joined #openstack-infra | 19:54 | |
openstackgerrit | A change was merged to openstack-infra/config: Shift more node use to HPCloud AZ2. https://review.openstack.org/57515 | 19:54 |
*** ^demon|sick is now known as ^d | 19:54 | |
*** ^d has joined #openstack-infra | 19:54 | |
*** reed_ is now known as reed | 19:54 | |
*** reed has quit IRC | 19:54 | |
*** reed has joined #openstack-infra | 19:54 | |
portante | jog0: ? | 19:55 |
jog0 | portante: your patch is on the top of the merge queue | 19:55 |
jeblair | clarkb, jog0, sdague: so should we re-load the queue? or just drop it? | 19:55 |
portante | I see the swift patch cool | 19:55 |
jog0 | anteaya: you had something | 19:56 |
anteaya | neutron needs https://review.openstack.org/#/c/53188/ and https://review.openstack.org/#/c/57475/ | 19:56 |
portante | there is another swift fix that seems to affect glance runs, that is 57019 | 19:56 |
anteaya | to merge in prep for a bug fix patch | 19:56 |
portante | jog0: | 19:56 |
clarkb | jog0: I think we should wait a little longer before really loading it back up again | 19:56 |
clarkb | er jeblair ^ | 19:56 |
jog0 | anteaya: 56475 isn't approved | 19:56 |
portante | https://review.openstack.org/57019 | 19:56 |
* koolhead17 lurks | 19:56 | |
clarkb | portante: I think you can reverify that one | 19:57 |
anteaya | jog0: asking in -neutron | 19:57 |
jog0 | clarkb: 57019 and https://review.openstack.org/#/c/57018/2 | 19:57 |
jeblair | clarkb, portante: i just reverified it | 19:57 |
clarkb | jeblair: that reverify seems to have grabbed 57018 too | 19:58 |
jeblair | clarkb: dependent change | 19:58 |
fungi | yeah, parent | 19:59 |
fungi | 57511 looks most unhappy | 19:59 |
fungi | DuplicateOptError: duplicate option: policy_file | 20:00 |
* portante is back | 20:00 | |
fungi | it'll need to be reworked i gues | 20:00 |
fungi | s | 20:00 |
jog0 | fungi: 57511 will have to wait | 20:00 |
jog0 | its not critical (that we know of) | 20:00 |
fungi | oh, i see. that one just happened to jump in as we started zuul, wasn't one of the set we cared about | 20:01 |
clarkb | fungi: I rechecked it | 20:02 |
clarkb | fungi: because it is the larger oslo sync which we need to get in for longer term oslo syncing | 20:02 |
clarkb | but doesn't affect the immediate problem | 20:02 |
fungi | precise3 and precise23 are back on line in jenkins now and seem to both be running jobs | 20:02 |
*** SergeyLukjanov has joined #openstack-infra | 20:08 | |
*** hogepodge has quit IRC | 20:08 | |
lifeless | gate-tempest-devstack-vm-neutron-large-ops: SUCCESS | 20:10 |
lifeless | thats a good sign | 20:10 |
*** sdake_ is now known as randallburt | 20:10 | |
fungi | yeah, early signs on the critical changes look good\ | 20:10 |
sdague | anteaya: do you have a consolidated email of the details so far on the code sprint (if one hit the list recently, a link is good enough), so I can start running it up the chain here? | 20:11 |
clarkb | I am drafting a thing at https://etherpad.openstack.org/p/icehouse-1-gate-reset | 20:11 |
anteaya | sdague: not yet, will provide | 20:11 |
*** derekh has joined #openstack-infra | 20:11 | |
*** dstanek has quit IRC | 20:12 | |
sdague | anteaya: cool, thanks | 20:12 |
jog0 | now that we are still in a fairly critical mode, its time for me to go to lunch | 20:12 |
anteaya | jog0: enjoy lunch | 20:12 |
jog0 | didn't think the timing would be so bad | 20:13 |
jog0 | if anything major comes up, email is the quickest way to get me for the next 45 minutes or so | 20:13 |
jog0 | clarkb: thanks for writting this up, it looks like we are fixing 4 or 5 bugs all at once | 20:13 |
jog0 | which is good | 20:13 |
jog0 | at least | 20:13 |
fungi | jog0: according to the time estimates, 45 minutes should be just about right to find out if it worked | 20:14 |
portante | gate jobs seem to be creeping in | 20:16 |
portante | do we want all these? | 20:16 |
clarkb | portante: not really | 20:16 |
clarkb | portante: we probably needed to shout louder about leaving the gate along | 20:17 |
clarkb | *alone | 20:17 |
*** melwitt1 has quit IRC | 20:17 | |
clarkb | portante: worst case we do what we did again and yell louder :) I think we will just live with it for now, the important things are looking good so far | 20:17 |
portante | yes | 20:17 |
*** melwitt has joined #openstack-infra | 20:17 | |
portante | k | 20:17 |
jeblair | clarkb: well, we haven't asked anyone to do that yet, so it's no surprise they didn't listen | 20:17 |
clarkb | jeblair: right | 20:18 |
jeblair | clarkb: but at any rate, yeah, i don't think it's a big deal. if we have to move something else, we can just kill it again | 20:18 |
clarkb | jeblair: I am trying to get a cohesive thought going in https://etherpad.openstack.org/p/icehouse-1-gate-reset | 20:18 |
clarkb | jeblair: I agree | 20:18 |
*** randallburt is now known as sdkae | 20:19 | |
clarkb | bah grenade just failed in the nova fix | 20:19 |
*** sdkae is now known as sdake_ | 20:19 | |
portante | yup | 20:19 |
*** vipul is now known as vipul-away | 20:19 | |
*** vipul-away is now known as vipul | 20:19 | |
jeblair | clarkb: because of the problem fixed in the grenade fix? | 20:19 |
jeblair | er devstack | 20:19 |
fungi | request timeout on verify resize | 20:20 |
clarkb | jeblair: maybe | 20:20 |
clarkb | portante: any chance you can look at the logs? | 20:20 |
*** ilyashakhat has quit IRC | 20:20 | |
portante | yes | 20:20 |
*** DinaBelova has joined #openstack-infra | 20:20 | |
portante | gotta link? | 20:20 |
fungi | portante: https://jenkins02.openstack.org/job/gate-grenade-devstack-vm/17023/consoleFull | 20:21 |
*** sdake_ is now known as randallburt | 20:21 | |
*** ilyashakhat has joined #openstack-infra | 20:21 | |
portante | wht about hte syslog.txt file? | 20:22 |
*** eharney has joined #openstack-infra | 20:22 | |
fungi | getting | 20:22 |
portante | k thx | 20:23 |
fungi | portante: http://logs.openstack.org/09/57509/2/gate/gate-grenade-devstack-vm/d2252da/logs/ | 20:24 |
jeblair | ok, all those stuck nodes on jenkins02 are deleted | 20:25 |
*** randallburt is now known as sdake_ | 20:26 | |
fungi | clarkb: your wording in the etherpad is spot on. i don't see a thing i disagree with or would change | 20:27 |
jeblair | ++ | 20:27 |
fungi | also thank you for drafting that | 20:28 |
*** dstanek has joined #openstack-infra | 20:28 | |
clarkb | cool, do we want to send it as a collective? | 20:29 |
*** ilyashakhat has quit IRC | 20:29 | |
clarkb | (I don't mind being the mail list target if not :) ) | 20:29 |
fungi | i'm fine with it either way, but your words can be your words and we can jump in if there's any contention | 20:30 |
*** ilyashakhat has joined #openstack-infra | 20:30 | |
*** sandywalsh has quit IRC | 20:30 | |
clarkb | http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-full/c500394/console.html it failed the full tempest runs too | 20:31 |
jeblair | clarkb: ++ what fungi said | 20:31 |
clarkb | in similar ways maybe? I think this bug fix may expose something else? boy wouldn't that be fun | 20:31 |
*** hashar has left #openstack-infra | 20:31 | |
anteaya | +1 on the ether pad clarkb | 20:32 |
*** hashar has joined #openstack-infra | 20:32 | |
clarkb | ok /me sends mail to openstack-dev | 20:32 |
clarkb | we can retry the reverify and see if the cahnges ahead of it help | 20:33 |
* portante still looking | 20:33 | |
* clarkb awaits portante's analysis | 20:34 | |
portante | guys, the rsyslog buffers are truncated | 20:34 |
*** kgriffs has joined #openstack-infra | 20:34 | |
portante | pki token values are huge | 20:34 |
portante | and some of the tracebacks are cut off | 20:34 |
portante | can we get a syslog config bumped to use bigger buffers | 20:34 |
clarkb | we did that grizzly time frame, maybe we didn't go big enough | 20:34 |
clarkb | portante: this is one reason we don't use syslog for most of the service logging though | 20:35 |
portante | they appear to be 2K | 20:35 |
clarkb | hmm I thought we bumped to 65k | 20:35 |
portante | yes | 20:35 |
portante | maybe something else is truncating, then | 20:35 |
clarkb | oh we ensure absent on the file that bumped the buggers | 20:36 |
clarkb | *buffer | 20:36 |
clarkb | we must've reverted that change | 20:36 |
portante | hmm | 20:37 |
portante | we might be able to put an option in to not log the entire PKI token, but that won't help with the Tracebacks | 20:37 |
jeblair | portante: what is it that's only logged to syslog and not to a file? | 20:37 |
lifeless | clarkb: is it 'gear' we use? | 20:38 |
clarkb | lifeless: yes | 20:38 |
lifeless | clarkb: thats not on pypi? | 20:38 |
clarkb | lifeless: it is | 20:38 |
lifeless | clarkb: do we run gearmand? | 20:38 |
clarkb | jeblair: swift proxy logs | 20:38 |
clarkb | lifeless: no we run geard | 20:38 |
lifeless | clarkb: or the python thing you pointed me at? | 20:38 |
jeblair | clarkb: that's not screen-s-proxy.txt ? | 20:39 |
lifeless | derekh: dprince: pleia2: https://pypi.python.org/pypi/gear | 20:39 |
fungi | clarkb: didn't we undo the buffer change because we were overrunning/crashing/something rsyslog? | 20:39 |
portante | jeblair, not sure what you mean | 20:39 |
jeblair | portante: eg http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-full/c500394/logs/screen-s-proxy.txt.gz | 20:39 |
clarkb | jeblair: it is but that file doesn't have timestamps and other nice things | 20:39 |
jeblair | clarkb: i see. | 20:40 |
clarkb | fungi: oh yeah, the service would compeltely fall over resulting in test failures | 20:40 |
jeblair | portante: can you add timestamps and other nice things to the swift proxy logs? | 20:40 |
hashar | jeblair: hi :-D I had an issue with the German Jenkins plugin. What would be the best way to expose it: openstack-infra list || launchpad bug ? | 20:41 |
dprince | clarkb/lifeless: then why int the world do we have a puppet module for saz-gearman | 20:41 |
* dprince is confused | 20:42 | |
jeblair | hashar: https://launchpad.net/gearman-plugin | 20:42 |
clarkb | dprince: because we were going to use C gearman, but then we found out C gearman is special in a couple ways | 20:42 |
hashar | jeblair: thanks :-] | 20:42 |
portante | jeblair: see https://review.openstack.org/56692 | 20:42 |
dprince | clarkb: so we can nuke that puppet module? | 20:42 |
openstackgerrit | Mathieu Gagné proposed a change to openstack-infra/jenkins-job-builder: Ensure jobparams and group_jobparams are dict https://review.openstack.org/57525 | 20:42 |
dprince | clarkb: the source of my confusion I think!!! | 20:42 |
jeblair | dprince: yes | 20:42 |
clarkb | dprince: probably, unless jeblair wants to move to the C server at some point | 20:42 |
jeblair | portante: neat :) | 20:43 |
portante | but that is only for proxy request loggin | 20:44 |
*** blamar has quit IRC | 20:44 | |
portante | jeblair: ^ | 20:44 |
* clarkb finally sends mail. hopefully we get good discussion | 20:44 | |
*** sandywalsh has joined #openstack-infra | 20:44 | |
portante | jeblair: it does not fix it for the other logs, that depends on how that is configured in devstack | 20:45 |
portante | do you know how that happens? | 20:45 |
jeblair | portante: i'm not expert in that, but i'd expect something in http://git.openstack.org/cgit/openstack-dev/devstack/tree/lib/swift | 20:46 |
*** vipul is now known as vipul-away | 20:48 | |
*** SergeyLukjanov has quit IRC | 20:49 | |
*** yolanda has quit IRC | 20:49 | |
*** hogepodge has joined #openstack-infra | 20:49 | |
*** blamar has joined #openstack-infra | 20:50 | |
anteaya | we have a patch offered https://review.openstack.org/#/c/57290/ | 20:50 |
*** senk has quit IRC | 20:50 | |
anteaya | but it hasn't passed check yet | 20:50 |
anteaya | we hope it will address https://bugs.launchpad.net/swift/+bug/1224001 | 20:51 |
uvirtbot | Launchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [High,In progress] | 20:51 |
clarkb | anteaya: looks like check tests are running on it now | 20:51 |
anteaya | cool | 20:51 |
*** vipul-away is now known as vipul | 20:51 | |
*** marun has quit IRC | 20:53 | |
anteaya | neutron likes 57290 so once it has passed check they are ready to approve | 20:53 |
portante | jeblair: looks like some uncaught exceptions in swift, not sure if they are related yet | 20:54 |
*** rpodolyaka1 has joined #openstack-infra | 20:54 | |
*** senk has joined #openstack-infra | 20:54 | |
clarkb | I am going to go grab a quick bite for lunch while we are playing the waiting game. | 20:54 |
clarkb | back in a bit | 20:55 |
anteaya | fungi jeblair 57290 has both check and gate jobs running on it at the same time | 20:55 |
anteaya | they were keen to get it through | 20:55 |
fungi | anteaya: that's possible if it was approved or reverified while checks were running | 20:55 |
anteaya | I would like it to pass check first and then queue for gate | 20:55 |
anteaya | it was | 20:55 |
anteaya | suggestions at this point? | 20:56 |
anteaya | leave it or ask for the approval to be removed? | 20:56 |
fungi | jeblair: clarkb: i gave up trying to figure out whether the 16 xvd limit is an underlying limitation on rackspace's xen rev and just opened a support case with them instead. Ticket ID | 20:56 |
fungi | 131120-00494-3 | 20:56 |
jeblair | fungi: cool | 20:56 |
jeblair | anteaya: leave it | 20:56 |
anteaya | leaving it | 20:56 |
*** jergerber has quit IRC | 20:57 | |
*** MarkAtwood has quit IRC | 20:58 | |
*** alcabrera has quit IRC | 20:59 | |
portante | jeblair: okay, I see, swift is run behind apache in this case, and so the default logging we do to the console does not add a timestamp | 21:00 |
*** DinaBelova has quit IRC | 21:00 | |
*** shardy has quit IRC | 21:01 | |
*** rpodolyaka1 has quit IRC | 21:01 | |
*** sarob has joined #openstack-infra | 21:01 | |
anteaya | won't matter anyway, the depedency patch failed in the gate, it appears it had never passed check | 21:01 |
anteaya | we are chatting about the importance of passing check tests in -neutron | 21:01 |
anteaya | well, I am anyway | 21:01 |
*** sarob has quit IRC | 21:02 | |
*** shardy has joined #openstack-infra | 21:03 | |
*** ^d has quit IRC | 21:03 | |
*** sarob has joined #openstack-infra | 21:04 | |
*** kgriffs is now known as kgriffs_afk | 21:04 | |
fungi | looks like the devstack 57373 change we wanted is also failing out on grenade in the post-upgrade tempest run (setupclass on tempest.api.compute.servers.test_server_addresses.ServerAddressesTestXML and tempest.api.volume.test_volumes_actions.VolumesActionsTest as well as tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm) | 21:04 |
*** ^d has joined #openstack-infra | 21:04 | |
*** sarob has quit IRC | 21:06 | |
*** sarob has joined #openstack-infra | 21:06 | |
pabelanger | clarkb, fungi: Was there any discussion about publishing horizon to pypi at the submit? | 21:06 |
pabelanger | In reference to https://review.openstack.org/#/c/54795/ | 21:07 |
fungi | pabelanger: not sure--i wasn't in any of the horizon sessions though there was some general discussion of possibly publishing all things to pypi once mordred's wheels bits are in and working as intended | 21:07 |
pabelanger | fair enough | 21:08 |
portante | jbelair, jog0: https://review.openstack.org/57526 | 21:09 |
david-lyle | pabelanger: we also plan to split horizon into a ui-toolkit library and what is now the openstack dashboard, but that will only get half to pypi without the wheels change | 21:09 |
notmyname | portante: devstack is running swift behind apache? | 21:10 |
portante | apparently | 21:11 |
notmyname | weird | 21:11 |
pabelanger | david-lyle, Ya, that's the main reason I wanted to see if up on pypi. I'm building a dashboard on top of it :) I should idle back in #openstack-horizon again to follow that development | 21:11 |
david-lyle | pabelanger: looking at i-2 for that split | 21:12 |
portante | probably not a good idea to only run behind apache, should probably have environments using the WSIG wrappers we provide | 21:12 |
mordred | pabelanger, fungi what? | 21:12 |
* pabelanger is excited | 21:12 | |
*** herndon_ has joined #openstack-infra | 21:12 | |
pabelanger | mordred, I opened an review about publishing horizon to pypi before the submit. Was asked to hold off until people could talk more about, was just looking for an update about it | 21:13 |
fungi | mordred: i thought we had discussed expanding the scope of which projects we publish to pypi once we're able to safely do prerelease versions there and have signing of pypi uploads in place | 21:13 |
mordred | yes | 21:13 |
mordred | two things | 21:13 |
mordred | a) splitting horizon and dashboard | 21:13 |
mordred | b) publishing all the things to pypi | 21:13 |
mordred | both are on the roadmap | 21:13 |
pabelanger | danke | 21:13 |
fungi | that matches my recollection | 21:14 |
pabelanger | I'll abandon my review for the time being | 21:14 |
*** DennyZhang has joined #openstack-infra | 21:15 | |
openstackgerrit | Dan Prince proposed a change to openstack-infra/config: Drop the saz-gearman module (we don't use it) https://review.openstack.org/57527 | 21:16 |
*** vipul is now known as vipul-away | 21:20 | |
*** vipul-away is now known as vipul | 21:20 | |
portante | jeblair: where are we at with the rsyslog buffer size thread of inquiry? | 21:20 |
*** otherwiseguy has joined #openstack-infra | 21:22 | |
lifeless | did 57509 fail? | 21:22 |
anteaya | fungi jeblair otherwiseguy wants to create this dependancy chain 53188 -> 54747 -> 57290 | 21:23 |
lifeless | oh, the other flaky test shot it in the head? | 21:23 |
*** dprince has quit IRC | 21:23 | |
*** hashar has quit IRC | 21:23 | |
jeblair | portante: i thought clarkb was working on that; i don't recall the changes/reversion he mentioned (i may not have been a part of that) | 21:23 |
anteaya | given that he has 53188 -> 54745 already | 21:23 |
portante | sorry, jeblair, clarkb? | 21:24 |
anteaya | how does he add 57290 to the end? | 21:24 |
anteaya | it is 54745 right otherwiseguy? not 54747 | 21:24 |
jeblair | anteaya: git review -d 54747; git review -x 57290; git review | 21:24 |
otherwiseguy | yeah, 54757 | 21:24 |
otherwiseguy | jeblair: thanks | 21:25 |
* otherwiseguy tries | 21:25 | |
jeblair | lifeless: i believe portante is looking into 57509 | 21:25 |
anteaya | jeblair: thank you | 21:26 |
lifeless | jeblair: it ran into https://bugs.launchpad.net/tempest/+bug/1230354 | 21:26 |
uvirtbot | Launchpad bug 1230354 in tempest "tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario fails sporadically" [Undecided,Confirmed] | 21:26 |
lifeless | jeblair: so is now requeued behind a bunch of other gate jobs | 21:26 |
fungi | all three of the fixes we tried to queue up front failed grenade | 21:27 |
anteaya | it was 57475 | 21:27 |
fungi | the swift one (well, its parent) just tanked too | 21:27 |
jeblair | portante: i think that conflicts with https://review.openstack.org/#/c/57373/4 (jog0's change that lists you as a co-author) | 21:29 |
jeblair | fungi: 57509, 57373, and what was the 3rd? | 21:30 |
jeblair | fungi: oh https://review.openstack.org/#/c/57019/ (and 18) | 21:31 |
fungi | yeah, that one | 21:31 |
*** LarsN has quit IRC | 21:32 | |
*** hogepodge has quit IRC | 21:32 | |
morganfainberg | perhaps there should be a zuul "mode" that allows administrative loading of tasks but just spools or ignores other tasks? | 21:33 |
jeblair | so 509 failed on a bug we don't have a fix for | 21:33 |
*** kgriffs_afk is now known as kgriffs | 21:33 | |
portante | regarding 509, it is hard to tell what happened with swift, but it does not appear to be part of the cause of 509 | 21:34 |
jeblair | morganfainberg: we try to empower the core teams of the projects themselves. we don't like to engineer things that make us special gatekeepers. | 21:34 |
morganfainberg | jeblair, thats fine, but in cases like today, perhaps it would be worthwhile? | 21:34 |
fungi | slippery slope | 21:35 |
*** hogepodge has joined #openstack-infra | 21:35 | |
portante | jeblair: yes, I'll fix that | 21:35 |
otherwiseguy | jeblair: that worked great. thakns again. | 21:35 |
morganfainberg | fungi, agreed. but sometimes it's worth considering for extraordinary times | 21:35 |
jog0 | I see the gate still doesn't look good | 21:38 |
jeblair | jog0: https://review.openstack.org/#/c/57509/ failed on https://bugs.launchpad.net/tempest/+bug/1230354 ; can you look at that bug? it doesn't look very fleshed out | 21:39 |
uvirtbot | Launchpad bug 1230354 in tempest "tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario fails sporadically" [Undecided,Confirmed] | 21:39 |
jog0 | jeblair: looking | 21:40 |
jog0 | jeblair: anything get merged? | 21:41 |
jeblair | jog0: no; 57509, 57373, 57019 all failed | 21:41 |
fungi | jog0: not any of the changes we put at the front anyway. all failed out | 21:41 |
jog0 | :( | 21:42 |
jeblair | fungi: 019 just got a comment saying someone is rebasing it on master (instead of 018), so that may improve its chances | 21:42 |
jog0 | clarkb: what happened to your tempest test patch to stable? | 21:42 |
jog0 | jeblair: that bug is hella vague | 21:42 |
*** hogepodge has quit IRC | 21:43 | |
jog0 | this just re-enforces fungi's theory about bugs | 21:44 |
jeblair | fungi, jog0: which is? | 21:44 |
jog0 | jeblair: something about bugs making friends with more bugs | 21:45 |
jog0 | "nondeterministic failures breed more nondeterministic failures, because people are so used to having to reverify their patches to get them to merge that they are doing so even when it's their patch which is introducing a nondeterministic bug" fungi | 21:46 |
*** nati_ueno has joined #openstack-infra | 21:46 | |
fungi | ahh, that postulate | 21:46 |
*** kgriffs is now known as kgriffs_afk | 21:48 | |
*** senk has quit IRC | 21:49 | |
jog0 | clarkb: http://logs.openstack.org/97/44097/18/check/gate-tempest-devstack-vm-postgres-full/270e611/logs/screen-n-cpu.txt.gz#_2013-09-25_15_07_49_824 | 21:51 |
jog0 | I can't seem to find 'iSCSI device not found at' in elasticSearch | 21:51 |
jog0 | any ideas | 21:51 |
jog0 | ohh that stack trace is old | 21:52 |
jog0 | never mind | 21:52 |
jog0 | shitty bug reports | 21:52 |
*** dcramer_ has joined #openstack-infra | 21:52 | |
*** MarkAtwood has joined #openstack-infra | 21:54 | |
*** hogepodge has joined #openstack-infra | 21:54 | |
jog0 | so https://review.openstack.org/#/c/57509/ had 3 failed jobs | 21:54 |
jog0 | can someone help me hunt the caues down | 21:54 |
jog0 | dims | 21:54 |
openstackgerrit | Khai Do proposed a change to openstack-infra/config: add nodepool to jenkins-dev server https://review.openstack.org/57333 | 21:55 |
clarkb | jog0: portante was looking at them to see if the swift bugs which had fixes behind it caused it to fail | 21:55 |
jog0 | cool | 21:55 |
jog0 | we have http://logs.openstack.org/09/57509/2/gate/gate-grenade-devstack-vm/d2252da/ | 21:55 |
portante | still noodling over 509, we'd like to get the rsyslog buffers increased so we can see more | 21:55 |
jog0 | http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-postgres-full/9a14cc3/ | 21:55 |
jog0 | portante: 509 failure for which one | 21:55 |
clarkb | portante: I don't know if you caught it but we had to revert the larger rsyslog buffers because it was killing the syslog service | 21:56 |
jog0 | or http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-full/c500394/ | 21:56 |
portante | yuck | 21:56 |
portante | how much larger did you go? | 21:56 |
clarkb | portante: 65k iirc I will check logs | 21:56 |
fungi | apparently the volume of log data we spam to syslog during devstack/tempest is more than it can deal with | 21:56 |
portante | we don't need 65k buffers, just 4 or 6k | 21:57 |
*** kgriffs_afk is now known as kgriffs | 21:57 | |
portante | we have Tracebacks that are lost | 21:57 |
clarkb | 64k | 21:57 |
jog0 | lifeless: https://review.openstack.org/#/c/57509/ hit a AssertionError: Console output was empty. | 21:57 |
jog0 | http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-postgres-full/9a14cc3/testr_results.html.gz | 21:57 |
fungi | also, that may have been back when we were using 2gb flavors for devstack tests clarkb? | 21:57 |
jog0 | lifeless: so that wasn't the fix | 21:57 |
clarkb | portante: ok I can propose a change that bumps it to 6k | 21:57 |
clarkb | fungi: maybe | 21:57 |
*** sdake_ has quit IRC | 21:57 | |
portante | great | 21:58 |
jog0 | lifeless: although its *A* fix | 21:58 |
portante | I'll fix that PKI token logging change which should help to reduce the proxy log sizes, at leasta | 21:58 |
portante | leasat | 21:58 |
portante | least | 21:58 |
jog0 | this greande failure is hurting us bad | 22:00 |
jog0 | mikal ^ | 22:00 |
jog0 | http://logs.openstack.org/09/57509/2/gate/gate-grenade-devstack-vm/d2252da/testr_results.html.gz' | 22:00 |
jog0 | HALP | 22:00 |
*** ericw has quit IRC | 22:00 | |
*** zul has quit IRC | 22:00 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Increase rsyslog buffer sizes. https://review.openstack.org/57538 | 22:00 |
*** sarob_ has joined #openstack-infra | 22:00 | |
clarkb | portante: fungi ^ | 22:01 |
*** sarob_ has quit IRC | 22:01 | |
portante | on it | 22:02 |
portante | clarkb: thanks | 22:02 |
mikal | Oh hai | 22:03 |
mikal | jog0: so the oslo thing didn't fix console logs? | 22:04 |
jog0 | mikal: no :( | 22:04 |
mikal | (Sorry, doing three things at once, so not sure where we're up to here) | 22:04 |
jog0 | it fixed something else though | 22:04 |
jog0 | just can't get it merged | 22:04 |
*** sarob has quit IRC | 22:04 | |
jog0 | mikal: we are no where here | 22:04 |
mikal | So... That revert didn't completely solve things. | 22:04 |
jog0 | nothing is merging still | 22:04 |
jog0 | mikal: we are on fire | 22:04 |
mikal | But it made things a little bit better. | 22:04 |
mikal | jog0: doomed! | 22:05 |
jog0 | mikal: we can't merge anything | 22:05 |
mikal | I feel a bit like we should stop all code approvals until we have this fixed | 22:05 |
jog0 | I am looking at this grenade failure http://logs.openstack.org/09/57509/2/gate/gate-grenade-devstack-vm/d2252da/logs/new/ | 22:05 |
jog0 | mikal: ++++ | 22:05 |
mikal | i.e. shut down the merge pipeline except for attempts to fix the gate | 22:05 |
mikal | Could we script removing the approve bit from all patches? | 22:05 |
mikal | Or add a -2 to all approved patches in a way we could detect and remove later? | 22:05 |
*** ryanpetrello has quit IRC | 22:05 | |
jeblair | i kind of think we should ask people to do that first | 22:06 |
mikal | This has been going for days now | 22:06 |
mikal | Its not fun any more | 22:06 |
* mikal has more gray hair | 22:06 | |
jeblair | and then solve it technically only if people don't listen | 22:06 |
clarkb | I'm with jeblair on this | 22:06 |
mikal | Certainly I think an email saying "approve nothing" is justified | 22:06 |
jeblair | clarkb: have you sent your email yet? | 22:07 |
clarkb | jeblair: I did send mine | 22:07 |
jog0 | mikal: works for me | 22:07 |
mikal | jog0: got any thoughts on that revert and if we should try it for reals? | 22:07 |
mikal | jog0: it doesn't seem conclusive to me... | 22:07 |
jeblair | mikal, jog0: maybe you could follow up clarkb's email with a stronger "okay, i really think we should stop approvals" msg | 22:07 |
mikal | Sure | 22:08 |
*** UtahDave has quit IRC | 22:09 | |
*** hogepodge has quit IRC | 22:09 | |
jog0 | I nominante mikal | 22:09 |
* mikal is drafting something now | 22:09 | |
jog0 | mikal: I am not sure about the revert | 22:09 |
*** branen has quit IRC | 22:10 | |
mikal | jog0: yeah, I had hope, but its not as clear cut as I was looking for | 22:10 |
fungi | clarkb: portante: on 57538, we're still going to have to wait for new nodepool images, unless we want to start new ones building here shortly and then expire out existing servers | 22:11 |
*** openstackstatus has joined #openstack-infra | 22:11 | |
clarkb | fungi: yeah, I can babysit that if we go down that road | 22:11 |
portante | fungi: you mean in order to get the bigger rsyslog buffers? | 22:11 |
fungi | portante: yes | 22:12 |
*** hogepodge has joined #openstack-infra | 22:12 | |
jog0 | mikal: I am looking into https://bugs.launchpad.net/tempest/+bug/1252170 which blocked the oslo sync | 22:12 |
uvirtbot | Launchpad bug 1252170 in tempest "tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm[compute] failed" [Critical,Confirmed] | 22:12 |
openstackgerrit | A change was merged to openstack-infra/config: Increase rsyslog buffer sizes. https://review.openstack.org/57538 | 22:12 |
fungi | portante: that's in a file baked into the machine image | 22:12 |
*** ryanpetrello has joined #openstack-infra | 22:12 | |
portante | fungi: oh, okay | 22:12 |
mikal | jog0: sigh | 22:12 |
jog0 | which we think fixes bug #1251784 | 22:12 |
uvirtbot | Launchpad bug 1251784 in tripleo "nova+neutron scheduling error: Connection to neutron failed: Maximum attempts reached (dup-of: 1251920)" [Critical,New] https://launchpad.net/bugs/1251784 | 22:12 |
uvirtbot | Launchpad bug 1251920 in nova "Tempest failures due to failure to return console logs from an instance" [Critical,In progress] https://launchpad.net/bugs/1251920 | 22:12 |
*** kgriffs is now known as kgriffs_afk | 22:12 | |
clarkb | fungi: I will kick off image builds via nodepool shortly (I believe we can manually trigger them) | 22:13 |
fungi | clarkb: yeah, it's easy, but you'll want to wait until that hits the puppet master first obviously | 22:13 |
mikal | jog0: what was the failure rate as a percentage for 1251920? | 22:13 |
clarkb | fungi: yup | 22:13 |
jog0 | mikal: we don't know, but a crap ton | 22:14 |
lifeless | thats the new metric | 22:14 |
jog0 | it had the highest rate by far | 22:14 |
fungi | i have to disappear shortly to cook dinner, then i'll jump back in and keep beating on things with the rest of you | 22:14 |
jog0 | but don't know % vs total | 22:14 |
mikal | jog0: so 2 out of 9 fails might be less? | 22:14 |
*** kgriffs_afk is now known as kgriffs | 22:14 | |
mikal | jog0: I'm wondering if we should just try the revert to see what happens at scale | 22:14 |
mikal | jog0: mostly because I am out of other ideas | 22:14 |
jog0 | mikal: hmm | 22:14 |
jog0 | I don't disagree with that logic | 22:15 |
*** ekarlso has quit IRC | 22:15 | |
jog0 | I think at some point we may need to do a git-bisect in all of havana | 22:15 |
jog0 | err icehouse | 22:15 |
mikal | That's super painful... | 22:15 |
mikal | We'd have to hook bisect up with git review somehow | 22:15 |
mikal | And then do rechecks a few times for each review | 22:15 |
mikal | I'm not sure what that would look like | 22:15 |
clarkb | just revert all of icehouse and start over >_> <- sorry I couldn't help it | 22:16 |
mikal | clarkb: tempting | 22:16 |
*** DennyZhang has quit IRC | 22:16 | |
*** branen has joined #openstack-infra | 22:17 | |
*** ekarlso has joined #openstack-infra | 22:17 | |
clarkb | rebuilding d-g images now | 22:17 |
*** sarob has joined #openstack-infra | 22:17 | |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce grouping like imports together https://review.openstack.org/54402 | 22:18 |
openstackgerrit | Ben Nemec proposed a change to openstack-dev/hacking: Enforce import group ordering https://review.openstack.org/54403 | 22:18 |
fungi | fork openstack at havana and slowly cherry-pick everything back in ;) | 22:19 |
*** dkliban_ has quit IRC | 22:20 | |
mikal | fungi: well, Havana _works_ at least | 22:21 |
jog0 | fungi: just squash and bisect | 22:21 |
jog0 | shouldn't be too hard | 22:21 |
jog0 | mikal: we already are making sure its not a tempest issue | 22:21 |
fungi | clarkb: at the current rate of slave turnover, there may be little point in expiring existing servers. they seem to be getting used as fast as we can build replacements anyway | 22:22 |
fungi | judging from the state of teh graph | 22:22 |
clarkb | fungi: nice, less work for me :) | 22:22 |
clarkb | 6 new images are building | 22:22 |
*** ryanpetrello_ has joined #openstack-infra | 22:23 | |
fungi | i suspect by the time you saw the image builds complete and wrote up the command to delete the ready images, they'd already be claimed by new jobs anyway | 22:23 |
anteaya | looking at https://review.openstack.org/#/c/57475/ Jenkins returns success but is not voting +1 | 22:25 |
anteaya | is that intentional? | 22:26 |
anteaya | I saw it on another patch earlier | 22:26 |
*** kgriffs is now known as kgriffs_afk | 22:26 | |
clarkb | anteaya: the check tests came back successful then the gate jobs started | 22:26 |
clarkb | anteaya: you are still waiting for the gate jobs to run | 22:26 |
*** ryanpetrello has quit IRC | 22:26 | |
*** ryanpetrello_ is now known as ryanpetrello | 22:26 | |
*** dangers is now known as danger_fo_away | 22:26 | |
anteaya | ah okay, thanks | 22:27 |
anteaya | the +1 disappears while the gate jobs are running | 22:27 |
lifeless | ok so does infra need more nodes? | 22:28 |
lifeless | clarkb: ^ | 22:28 |
mgagne | zaro: ping | 22:29 |
clarkb | jeblair: fungi: http://logs.openstack.org/06/57506/1/check/check-tempest-devstack-vm-postgres-full/9150091/logs/devstack-gate-setup-workspace-new.txt is an interesting failure | 22:29 |
clarkb | lifeless: not anymore | 22:29 |
clarkb | lifeless: killing things and focusing on fixing the gate has freed up breathing room | 22:29 |
clarkb | lifeless: we may need more when we get tests passing again, but I don't want to worry about that now | 22:30 |
mgagne | zaro: https://review.openstack.org/#/c/57525/ Antoine thinks it would be better to force jobparams to be a dict instead | 22:30 |
clarkb | jeblair: fungi: error: Failed connect to zuul.openstack.org:80; Connection timed out while accessing http://zuul.openstack.org/p/openstack/tempest/info/refs I am going to look at haproxy logs now | 22:30 |
clarkb | er nevermind that is zuul | 22:30 |
clarkb | hmm load on zuul is >20 | 22:33 |
clarkb | *loadaverage | 22:33 |
lifeless | bussay | 22:33 |
chmouel | clarkb: any chance to get a second +2 on this https://review.openstack.org/#/c/56927 (should be trivial) | 22:34 |
*** kgriffs_afk is now known as kgriffs | 22:34 | |
chmouel | or anyone else from infra with +2 ^ | 22:34 |
*** joshuamckenty has joined #openstack-infra | 22:34 | |
clarkb | chmouel: right now new projects are basically at the bottom of the priority queue | 22:34 |
*** dcramer_ has quit IRC | 22:34 | |
jog0 | chmouel: http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html | 22:35 |
joshuamckenty | Hey happy -infra folks | 22:35 |
clarkb | chmouel: happy to look at it once things settle down | 22:35 |
chmouel | ah how sorry guys | 22:35 |
joshuamckenty | could you set up a new lists.openstack.org mailing list for the DefCore committee please? | 22:35 |
jog0 | joshuamckenty: http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html | 22:35 |
chmouel | clarkb, jog0: good lucks to you all | 22:35 |
joshuamckenty | we need myself (josh@openstack.org) and Rob Hirschfeld (rob_hirschfeld@dell.com) as admins | 22:35 |
clarkb | joshuamckenty: you can propose the change in puppet | 22:35 |
joshuamckenty | gotcha | 22:35 |
clarkb | joshuamckenty: openstack-infra/config/modules/openstack_project/manifests/lists.pp there should be examples in that file | 22:36 |
joshuamckenty | sweet | 22:36 |
jeblair | joshuamckenty: http://ci.openstack.org/lists.html <-- docs | 22:36 |
*** thomasem has quit IRC | 22:36 | |
portante | jog0: new patch to devstack/lib/swift | 22:36 |
portante | patches, really | 22:36 |
clarkb | jeblair: lots of git upload packs running on zuul. maybe we need get those behind git.o.o by replicating zuul refs there? that ends up being messy iirc | 22:37 |
*** changbl has quit IRC | 22:38 | |
jeblair | clarkb: yes, if we want to spread that load, we should probably replicate to different repos; i'm not keen on zuul refs being in the canonical ones | 22:38 |
jeblair | clarkb: we could potentially use the same git server farm just with different paths, or make another farm | 22:39 |
*** jhesketh_ has quit IRC | 22:39 | |
*** thingee has joined #openstack-infra | 22:39 | |
clarkb | jeblair: possibly split the farm as I think it is currently far overpowered | 22:39 |
*** jhesketh_ has joined #openstack-infra | 22:40 | |
thingee | noticed a temporary connection issue to mirror.rackspace.com | 22:40 |
thingee | http://logs.openstack.org/06/57406/2/check/check-tempest-devstack-vm-neutron/590e7e0/console.html | 22:40 |
*** MarkAtwood has quit IRC | 22:42 | |
*** MarkAtwood has joined #openstack-infra | 22:42 | |
*** yassine has quit IRC | 22:42 | |
jog0 | thingee: we have bigger issues ATM http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html | 22:42 |
clarkb | and there isn't much we can do about that (other than run our own mirror) | 22:42 |
*** mriedem has quit IRC | 22:43 | |
*** dkranz has quit IRC | 22:43 | |
*** kgriffs is now known as kgriffs_afk | 22:43 | |
*** branen has quit IRC | 22:44 | |
jeblair | if it keeps happening, we will, but it's generally fairly stable. it's good to have reports of when it does happen so we can keep an eye on it. | 22:44 |
jog0 | hopefully this will fix the grenade issue https://review.openstack.org/#/c/57357/ | 22:44 |
*** hogepodge has quit IRC | 22:44 | |
jeblair | #status alert Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html | 22:45 |
openstackstatus | NOTICE: Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html | 22:45 |
*** ChanServ changes topic to "Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html" | 22:45 | |
thingee | jeblair: thanks | 22:45 |
*** zul has joined #openstack-infra | 22:45 | |
*** hdd has quit IRC | 22:45 | |
clarkb | having the data is useful, in fact I think there is a bug for that problem /me looks really fast we can add this as a data point | 22:46 |
*** rcleere has quit IRC | 22:46 | |
lifeless | clarkb: are we starved of non-dg nodes? e.g. will things that don't trigger d-g be ok? | 22:47 |
*** ilyashakhat has quit IRC | 22:47 | |
lifeless | clarkb: or do you want everything halted ? | 22:47 |
clarkb | thingee: https://bugs.launchpad.net/openstack-ci/+bug/1251117 is the bug, you can attach a comment with a link to that build log | 22:47 |
uvirtbot | Launchpad bug 1251117 in openstack-ci "devstack-vm gate failure due to apt-get download failure" [Undecided,New] | 22:47 |
clarkb | lifeless: currently we seem to be ok with non d-g slaves | 22:47 |
jog0 | we are close to getting a patch merged ! | 22:48 |
clarkb | lifeless: you are probably ok approving things that are not part of the integrated gate | 22:48 |
clarkb | jog0: woot | 22:48 |
lifeless | I'm just swimming up through my daily review stuff, seeking to avoid causing headaches | 22:48 |
lifeless | clarkb: ok, thanks | 22:48 |
*** ilyashakhat has joined #openstack-infra | 22:48 | |
thingee | clarkb: done | 22:49 |
openstackgerrit | Joshua McKenty proposed a change to openstack-infra/config: Adding the DefCore committee list https://review.openstack.org/57547 | 22:51 |
*** pcm_ has quit IRC | 22:51 | |
*** dkliban_ has joined #openstack-infra | 22:51 | |
fungi | lifeless: clarkb: we do end up starved for all nodes once the gate grows deep enough that there are more jobs than we have machines (in terms of nodepool quota room and static slaves) due to thrash when there are gate resets, but i don't know that more nodes really solves that situation | 22:52 |
* portante steps out for a bit, back in a few hours | 22:54 | |
clarkb | fungi: portante: new d-g images in hpcloud are in. still waiting on rax | 22:55 |
*** joshuamckenty has quit IRC | 22:56 | |
*** branen has joined #openstack-infra | 22:56 | |
*** weshay has quit IRC | 22:58 | |
*** masayukig has joined #openstack-infra | 23:00 | |
*** eharney has quit IRC | 23:02 | |
*** jhesketh__ has joined #openstack-infra | 23:03 | |
*** sarob has quit IRC | 23:03 | |
*** Ng has quit IRC | 23:03 | |
*** sarob has joined #openstack-infra | 23:03 | |
jog0 | so we have 4 patches we think should help | 23:07 |
jog0 | but don't have a fix for the big one, the console log | 23:07 |
jog0 | mikal lifeless ^ | 23:07 |
jog0 | I think we are back to the drawing board for console log | 23:07 |
mikal | jog0: as in the revert is a dead end? | 23:07 |
mikal | jog0: bugger | 23:07 |
jog0 | mikal: so we didn't try the revert neutron bug | 23:08 |
*** sarob has quit IRC | 23:08 | |
jog0 | mikal: I am refering to https://review.openstack.org/#/c/57509/ | 23:08 |
*** joshuamckenty has joined #openstack-infra | 23:08 | |
*** joshuamckenty has quit IRC | 23:10 | |
jog0 | mikal:if you have any idea I am for it | 23:13 |
*** flaper87 is now known as flaper87|afk | 23:13 | |
anteaya | jog0: something -neutron can help with? | 23:15 |
anteaya | we are standing by | 23:15 |
*** michchap_ has quit IRC | 23:15 | |
mikal | jog0: I am out of ideas | 23:16 |
mikal | I will continue grinding though | 23:16 |
*** changbl has joined #openstack-infra | 23:17 | |
jog0 | mikal: yeah me too | 23:17 |
jog0 | mikal: if want a different bug | 23:18 |
*** michchap has joined #openstack-infra | 23:18 | |
anteaya | hopefully when 57290 goes through, https://bugs.launchpad.net/swift/+bug/1224001 will be fixed from -neutron | 23:18 |
uvirtbot | Launchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [High,In progress] | 23:18 |
*** slong has joined #openstack-infra | 23:19 | |
jog0 | anteaya: fingers crossed | 23:19 |
*** herndon_ has quit IRC | 23:20 | |
anteaya | yeah here too | 23:21 |
clarkb | jog0: can we get a tl;dr status report. 57509 does not fix the console thing but does fix a valud bug, 57290 fixes a neutron bug, both are in the gate now fingers crossed | 23:21 |
clarkb | jog0: 57357 is also in the gate to disable the v3 tests because they were causing problems with grenade | 23:22 |
clarkb | does that leave us with the major bug being 1251920? | 23:22 |
jog0 | clarkb: clueless on 1251920 | 23:23 |
jog0 | mikal and I are questining our sanity | 23:23 |
clarkb | fungi: portante: every cloud region but rax dfw has the new d-g image | 23:24 |
jog0 | clarkb: looking for other ideas | 23:24 |
jog0 | may just revert all the things | 23:24 |
*** fifieldt has joined #openstack-infra | 23:24 | |
clarkb | I suggested that crazy idea and think it is overkill | 23:25 |
clarkb | mostly because for example with tempest the diff is >7k lines between now and havana | 23:25 |
clarkb | I imageine nova is worse | 23:25 |
jog0 | clarkb: and we have https://bugs.launchpad.net/tempest/+bug/1252170 which we have a partial fix for | 23:26 |
uvirtbot | Launchpad bug 1252170 in tempest "tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm[compute] failed" [Critical,In progress] | 23:26 |
clarkb | jog0: is that in the gate yet? /me looks at bug | 23:26 |
*** loq_mac has joined #openstack-infra | 23:28 | |
*** datsun180b has quit IRC | 23:29 | |
jog0 | part is( v3 | 23:29 |
jog0 | disable | 23:29 |
notmyname | are you suggesting reverting every patch that has landed in every openstack project since havana and then running them each through to see what breaks and find the cause of the instability? | 23:29 |
notmyname | as the crazy idea | 23:29 |
jog0 | notmyname: if that was easy, then yes | 23:30 |
notmyname | jog0: I posit that it isn't easy ;-) | 23:30 |
clarkb | the "joke" I made was revert to havana and start over :) nevermind trying to find what causes the breakage :) | 23:30 |
notmyname | ya. just making sure I actually read all that correctly | 23:30 |
jog0 | notmyname: revert large chuncks of nova | 23:31 |
jog0 | (not merge though) | 23:31 |
*** markmc has quit IRC | 23:34 | |
*** thedodd has quit IRC | 23:34 | |
*** sdake_ has joined #openstack-infra | 23:37 | |
*** mfer has quit IRC | 23:37 | |
clarkb | jog0: https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-large-ops/14880/console :( | 23:38 |
jog0 | clarkb: WAT | 23:39 |
jog0 | whats taht from | 23:39 |
fungi | so... maybe not entirely easy but... hook git bisect into git review coupled with multiple rechecks and see what you can zero in on? probably completely bonkers | 23:39 |
clarkb | jog0: http://logs.openstack.org/09/57509/2/gate/gate-tempest-devstack-vm-large-ops/1171354/logs/screen-key.txt.gz | 23:40 |
clarkb | keystone is saying the address is already in use, we have seen that infrequently. jeblair double checked last time to make sure the slave wasn't used twice | 23:40 |
jeblair | fungi: better if you can also mark them WIP as you add them | 23:40 |
clarkb | I can do a quick sanity check on that now | 23:40 |
clarkb | the node is gone in jenkins, to the logstash | 23:41 |
jeblair | clarkb: http://paste.openstack.org/show/53707/ nodepool says used once; feel free to confirm with logstash | 23:43 |
anteaya | really turbo-hipster? | 23:43 |
fungi | anteaya: i hear you can blame jhesketh_ | 23:43 |
clarkb | jeblair: logstash confirms tags:"console.html" AND message:"Building remotely on" AND message:"devstack-precise-hpcloud-az2-703687" | 23:43 |
anteaya | jhesketh_: really? | 23:43 |
jhesketh__ | anteaya: umm? what am I being blamed for? | 23:44 |
fungi | jhesketh__: taking pride in random program names, i think | 23:45 |
jhesketh__ | oh right, yes, you can blame me for that | 23:45 |
anteaya | no, for squeezing in a gate change while we are all on gate lockdown | 23:45 |
anteaya | gate blocking bug fixes only, everyone watching zuul TV | 23:46 |
fungi | oh, i was lacking in context then, sorry | 23:46 |
jog0 | turby himpster lock down | 23:46 |
anteaya | and up pops a turbo-hipster | 23:46 |
jhesketh__ | ah, so I didn't realise it was also on lockdown for stackforge too (I'm a bit out of the loop) | 23:46 |
jeblair | anteaya: what change? | 23:46 |
jhesketh__ | very sorry guys, I'll stop | 23:46 |
clarkb | anteaya: I semi sort of said people doing things for projects not part of the integrated gate were fine | 23:46 |
clarkb | anteaya: we have plenty of non d-g slaves doing nothing | 23:46 |
anteaya | jeblair: it just finished, a stackforge | 23:46 |
anteaya | jhesketh__: was funny too | 23:47 |
jeblair | jhesketh__, anteaya: yeah, don't worry about it if it's not in the devstack/tempest change queue (the 'openstack integrated gate') | 23:47 |
anteaya | ah okay | 23:47 |
jhesketh__ | anteaya: by zuul TV do you mean the status page? | 23:47 |
fungi | yeah, things running devstack-tempest jobs mostly | 23:47 |
anteaya | I've come down pretty hard on -neutron, just wanting to keep the consistency | 23:47 |
jhesketh__ | jeblair: okay, so it's cool if I merge through another 6 or so patches? (I'm happy to wait anyway) | 23:47 |
anteaya | and btw they have responded well | 23:47 |
anteaya | jhesketh__: yes I call the status page zuul TV | 23:48 |
jeblair | i mean, it's full name is "openstack-dev/devstack, openstack-dev/grenade, openstack-dev/pbr, openstack-infra/devstack-gate, openstack-infra/jeepyb, openstack-infra/pypi-mirror, openstack/ceilometer, openstack/cinder, openstack/glance, openstack/heat, openstack/horizon, openstack/keystone, openstack/neutron, openstack/nova, openstack/oslo.config, openstack/oslo.messaging, openstack/oslo.version, openstack/python-ceilometerclient, openstack/p | 23:48 |
fungi | [...] | 23:48 |
jeblair | but we like to abbreviate it | 23:49 |
jeblair | jhesketh__: yeah it won't hurt anything | 23:49 |
jhesketh__ | okay cool | 23:49 |
mordred | jeblair: I think we shoudl start calling it by its full name | 23:50 |
jhesketh__ | I know you guys have been working really hard on fixing the gates, let me know if I can do anything to help! :-) | 23:50 |
jog0 | jhesketh__: we got a nova bug for you | 23:50 |
jhesketh__ | the one everybody is stuck on? | 23:50 |
jog0 | no the otherone | 23:51 |
jog0 | :) | 23:51 |
jhesketh__ | oh I can look at the other one | 23:51 |
jog0 | jhesketh__: https://bugs.launchpad.net/tempest/+bug/1252170 | 23:51 |
uvirtbot | Launchpad bug 1252170 in tempest "tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm[compute] failed" [Critical,In progress] | 23:51 |
jog0 | we are banging away on that in nova room | 23:52 |
*** thomasem has joined #openstack-infra | 23:53 | |
*** mrodden has quit IRC | 23:54 | |
jhesketh__ | yep, watching :-) | 23:54 |
*** julim has quit IRC | 23:55 | |
jog0 | clarkb: https://review.openstack.org/57566 | 23:58 |
jog0 | hehe | 23:58 |
clarkb | jog0: I think I know what the problem is with keystone not starting. the default keystone port is in the default linux local ephemeral port range | 23:58 |
jog0 | revert all work from this week | 23:58 |
jog0 | clarkb: FAIL | 23:59 |
jog0 | wow | 23:59 |
clarkb | jog0: I will propose a change to devstack probably to come up with a fix | 23:59 |
jog0 | you should file a big for that | 23:59 |
clarkb | basically shift the ephemeral port range | 23:59 |
jeblair | clarkb: what's the default port? | 23:59 |
clarkb | jog0: it is the IANA assigned port | 23:59 |
clarkb | jeblair: 35357 | 23:59 |
clarkb | linux range is 32768 to 61000 | 23:59 |
jeblair | that's terrible | 23:59 |
lifeless | https://blueprints.launchpad.net/keystone/+spec/iana-register-port | 23:59 |
clarkb | so unlikely to have problems but when you run as many tests as we do... | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!