*** mrodden has quit IRC | 00:01 | |
clarkb | mordred: I left a comment on the change, I think I managed to express my concern properly, but let me know if it isn't clear | 00:02 |
---|---|---|
*** vipul is now known as vipul-away | 00:04 | |
*** openstackgerrit has quit IRC | 00:04 | |
*** openstackgerrit has joined #openstack-infra | 00:05 | |
*** pcm_ has quit IRC | 00:06 | |
*** krtaylor has joined #openstack-infra | 00:07 | |
*** weshay has quit IRC | 00:09 | |
*** mrodden has joined #openstack-infra | 00:14 | |
openstackgerrit | Dan Bode proposed a change to openstack-infra/config: Add stackforge project: puppet_openstack_builder https://review.openstack.org/51079 | 00:15 |
*** alexpilotti has quit IRC | 00:15 | |
clarkb | mordred: https://review.openstack.org/#/c/33926/5 if I +2 that do you want to babysit an approval? | 00:15 |
clarkb | I need to drop offline here for a bit in order to get move stuff done prior to the weekend | 00:16 |
* clarkb AFKs to do that. I did +2 the change. I think it just needs a sanity check once in so that the next gerrit restart doesn't go sideways | 00:17 | |
*** melwitt has quit IRC | 00:17 | |
*** vipul-away is now known as vipul | 00:17 | |
*** dripton has joined #openstack-infra | 00:17 | |
*** melwitt has joined #openstack-infra | 00:19 | |
*** alchen99 has quit IRC | 00:20 | |
openstackgerrit | A change was merged to openstack-infra/config: Document how to delete a pad from Etherpad Lite https://review.openstack.org/46329 | 00:20 |
*** CaptTofu has quit IRC | 00:21 | |
*** CaptTofu has joined #openstack-infra | 00:21 | |
*** hogepodge has quit IRC | 00:22 | |
openstackgerrit | Dan Bode proposed a change to openstack-infra/config: Add stackforge project: puppet_openstack_builder https://review.openstack.org/51079 | 00:22 |
*** amotoki has joined #openstack-infra | 00:23 | |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Add repo scm https://review.openstack.org/45165 | 00:23 |
*** dripton has quit IRC | 00:25 | |
*** matsuhashi has joined #openstack-infra | 00:27 | |
*** senk has joined #openstack-infra | 00:27 | |
*** oubiwann_ has quit IRC | 00:27 | |
openstackgerrit | A change was merged to openstack-infra/devstack-gate: Improve fallback to master branch https://review.openstack.org/49894 | 00:27 |
openstackgerrit | A change was merged to openstack-infra/devstack-gate: Revert "Revert "Enable q-vpn service"" https://review.openstack.org/50242 | 00:27 |
openstackgerrit | A change was merged to openstack-infra/devstack-gate: Conditionally override PyPI for reqs integration https://review.openstack.org/50198 | 00:27 |
*** dripton has joined #openstack-infra | 00:33 | |
*** gyee has quit IRC | 00:35 | |
*** dripton has quit IRC | 00:42 | |
*** sandywalsh_ has joined #openstack-infra | 00:42 | |
*** sandywalsh has quit IRC | 00:43 | |
*** nosnos has joined #openstack-infra | 00:44 | |
*** dripton has joined #openstack-infra | 00:45 | |
*** krtaylor has quit IRC | 00:57 | |
*** sarob has joined #openstack-infra | 00:58 | |
*** senk has quit IRC | 01:02 | |
*** wenlock has quit IRC | 01:07 | |
*** sarob has quit IRC | 01:08 | |
*** sarob has joined #openstack-infra | 01:09 | |
*** melwitt has quit IRC | 01:12 | |
*** DennyZhang has joined #openstack-infra | 01:12 | |
*** senk has joined #openstack-infra | 01:14 | |
stevebaker | hey, is there some permissions I need to review heat proposals on http://summit.openstack.org/ ? | 01:18 |
*** markmcclain has joined #openstack-infra | 01:23 | |
*** mriedem has joined #openstack-infra | 01:24 | |
lifeless | PTL | 01:26 |
*** yaguang has joined #openstack-infra | 01:27 | |
*** yaguang has quit IRC | 01:27 | |
*** yaguang has joined #openstack-infra | 01:28 | |
*** basha has joined #openstack-infra | 01:31 | |
*** senk has quit IRC | 01:39 | |
*** chris613 has quit IRC | 01:48 | |
*** guohliu has quit IRC | 01:49 | |
*** jhesketh__ has quit IRC | 01:57 | |
*** wenlock has joined #openstack-infra | 02:01 | |
*** jhesketh has joined #openstack-infra | 02:02 | |
*** ArxCruz has joined #openstack-infra | 02:05 | |
*** dkranz has joined #openstack-infra | 02:08 | |
*** ArxCruz_ has joined #openstack-infra | 02:12 | |
*** fifieldt has joined #openstack-infra | 02:13 | |
*** xchu has joined #openstack-infra | 02:14 | |
*** ArxCruz has quit IRC | 02:15 | |
*** sarob has quit IRC | 02:15 | |
*** ArxCruz_ has quit IRC | 02:20 | |
*** alchen99 has joined #openstack-infra | 02:24 | |
*** krtaylor has joined #openstack-infra | 02:26 | |
*** alchen99 has quit IRC | 02:36 | |
*** senk has joined #openstack-infra | 02:40 | |
*** guohliu has joined #openstack-infra | 02:43 | |
*** locke105 has quit IRC | 02:44 | |
*** crank has quit IRC | 02:44 | |
*** kpepple has quit IRC | 02:44 | |
*** alaski has quit IRC | 02:44 | |
*** mkerrin has quit IRC | 02:44 | |
*** guitarzan has quit IRC | 02:44 | |
*** Reapster has quit IRC | 02:44 | |
*** Vivek has quit IRC | 02:44 | |
*** davidlenwell has quit IRC | 02:44 | |
*** BobBall has quit IRC | 02:44 | |
*** Ng has quit IRC | 02:44 | |
*** alaski_ has joined #openstack-infra | 02:44 | |
*** BobBall has joined #openstack-infra | 02:44 | |
*** guitarzan has joined #openstack-infra | 02:44 | |
*** Reapster has joined #openstack-infra | 02:44 | |
*** crank has joined #openstack-infra | 02:44 | |
*** Vivek has joined #openstack-infra | 02:44 | |
*** kpepple has joined #openstack-infra | 02:44 | |
*** Ng has joined #openstack-infra | 02:44 | |
*** locke105 has joined #openstack-infra | 02:44 | |
*** senk has quit IRC | 02:44 | |
*** Vivek is now known as Guest86586 | 02:45 | |
*** mkerrin has joined #openstack-infra | 02:45 | |
*** davidlenwell has joined #openstack-infra | 02:45 | |
*** erfanian has joined #openstack-infra | 02:49 | |
*** mriedem has quit IRC | 02:49 | |
*** matsuhashi has quit IRC | 02:57 | |
lifeless | mordred: you might care about https://bugs.launchpad.net/tripleo/+bug/1222306 | 02:57 |
uvirtbot | Launchpad bug 1222306 in tripleo "can't install keystone with pypi mirror" [Medium,Triaged] | 02:57 |
lifeless | mordred: or https://bugs.launchpad.net/tripleo/+bug/1222308 | 02:57 |
uvirtbot | Launchpad bug 1222308 in tripleo "can't install cinderclient with pypi mirror" [Medium,Triaged] | 02:57 |
*** HenryG has joined #openstack-infra | 02:58 | |
clarkb | lifeless: we really should require <0.8alpha or whatever the lowest 0.8 version is | 02:59 |
lifeless | clarkb: of requests? | 03:00 |
*** basha has quit IRC | 03:00 | |
clarkb | lifeless: sqlalchemy | 03:00 |
clarkb | its silly we can't just say <0.8 | 03:01 |
mordred | ah. fascinating | 03:01 |
mordred | clarkb: we can with pip 1.4 | 03:01 |
lifeless | clarkb: oh right, there are two distinct bugs | 03:01 |
clarkb | mordred: right, but everyone else doesn't do new pip | 03:01 |
lifeless | mordred: yeah, I found this testing --offline with a fresh mirror | 03:01 |
lifeless | mordred: so this is in the 'stuff we don't mirror in' category | 03:01 |
lifeless | the problem is global requirements doesn't list all the different requirements all releases of clients had | 03:02 |
mordred | lifeless: yah. https://review.openstack.org/#/q/topic:openstack/requirements,n,z | 03:02 |
clarkb | mordred: https://review.openstack.org/#/c/51053/ | 03:03 |
mordred | clarkb: I think we have a script bug: https://review.openstack.org/#/c/49201/ | 03:03 |
mordred | look at the commit message | 03:03 |
*** flaper87|afk has quit IRC | 03:03 | |
*** flaper87|afk has joined #openstack-infra | 03:03 | |
clarkb | we do, 51053 fixes it :) | 03:03 |
*** mkerrin has quit IRC | 03:03 | |
*** mkerrin has joined #openstack-infra | 03:03 | |
*** HenryG has quit IRC | 03:03 | |
*** HenryG has joined #openstack-infra | 03:03 | |
lifeless | mordred: I'm not sure how that will fix the issue | 03:03 |
lifeless | mordred: we're installing releases | 03:03 |
mordred | done | 03:04 |
mordred | what? | 03:04 |
lifeless | mordred: when we pip install nova trunk | 03:04 |
lifeless | mordred: we get a release of python-neutronclient | 03:04 |
mordred | yah | 03:04 |
mordred | k | 03:04 |
* mordred bats eyelashes | 03:04 | |
lifeless | mordred: if the current requirements rules don't bring down versions that match the requirements when the release of that client was cut | 03:05 |
mordred | all of the projects should merge all of those changes and then cut releases | 03:05 |
clarkb | mordred: thanks. I also made sure to document why that horrible read into a variable trick is used | 03:05 |
mordred | hrm. ok | 03:05 |
mordred | lifeless: I grok what you are saying | 03:05 |
clarkb | because I keep forgetting why we did that and I don't want to have to remember | 03:05 |
lifeless | mordred: I don't claim to have an answer yet | 03:05 |
lifeless | mordred: just thought you should have it in your thinking cap | 03:05 |
openstackgerrit | A change was merged to openstack-infra/config: Use a single change ID per requirement proposal. https://review.openstack.org/51053 | 03:05 |
mordred | lifeless: I think this may fall in to the category of things that jeblair was worried about in terms of enabling use of our mirror for non-gate activities | 03:05 |
mordred | lifeless: which is to say, I think it may have some design holes | 03:06 |
lifeless | mordred: we're not using your mirror yet | 03:06 |
lifeless | mordred: this is a fresh run-mirror'd mirror | 03:06 |
mordred | lifeless: yup. I grok. but the mirror script is designed to keep a running mirror | 03:06 |
lifeless | mordred: right, ack. | 03:06 |
mordred | lifeless: thinking cap on - btw | 03:06 |
mordred | this is my way of thinking | 03:06 |
lifeless | once we get sophisticated enough in our CI | 03:07 |
lifeless | we'll spin up new mirrors as part of the test | 03:07 |
lifeless | and detect this | 03:07 |
mordred | I will be honest - my most recent thinking has been to investigate use of devpi | 03:07 |
lifeless | s/the test/a test/ | 03:07 |
lifeless | mordred: fully offline is very attractive for dc bringup stories | 03:07 |
mordred | yup. devpi has fully offline | 03:07 |
lifeless | mordred: so I'm not super keen on devpi | 03:07 |
lifeless | mordred: I thought it only captured what you used? | 03:08 |
mordred | it also has pockets | 03:08 |
mordred | so you can have a "mirror upstream" pocket, and a "my local stuff" which depends on the "mirror upstream" | 03:08 |
lifeless | mordred: so devpi would demonstrate the same failure mode | 03:08 |
mordred | so pointing at my local stuff will get you both | 03:08 |
mordred | lifeless: yes. I'm just saying | 03:08 |
lifeless | ok, tangent, sure. | 03:08 |
mordred | I've been thinking that richer implementation scripting might be better served at this point by devpi instead of pypi-mirror | 03:09 |
mordred | BUT | 03:09 |
mordred | I support the goal you are expressing | 03:09 |
lifeless | cool | 03:09 |
mordred | ish | 03:09 |
mordred | sort of | 03:09 |
mordred | I mean | 03:09 |
mordred | yeah | 03:09 |
lifeless | so I suspect we're going to be gating a different scenario than the gate currently does | 03:09 |
mordred | yup | 03:09 |
lifeless | I'm thinking I should mail the list when we're in sight of success | 03:09 |
lifeless | and get discussion | 03:09 |
lifeless | and/or a session in the CFP at the project level I guess | 03:09 |
mordred | oy | 03:10 |
clarkb | mordred: are you thinking we should use devpi for our mirror too? | 03:10 |
*** dims has quit IRC | 03:10 | |
mordred | clarkb: toying with the idea | 03:10 |
mordred | clarkb: the fact that it support multiple sets of things | 03:10 |
mordred | clarkb: and local uploads | 03:10 |
mordred | but also linking things | 03:10 |
mordred | is very attractive | 03:11 |
mordred | downside: it serves things from pyton instead of apache | 03:11 |
clarkb | right, I was just going to ask about that | 03:11 |
mordred | yup. that's the assinine part | 03:11 |
mordred | but also the part that allows you to describe sets that depend on other sets | 03:11 |
mordred | so, you know, feature. bug. | 03:11 |
mordred | also - I'm thrilled that 3rd party testing has finally caught on | 03:13 |
mordred | it only took a year | 03:13 |
mordred | maybe a year and a half | 03:13 |
clarkb | mordred: so I was thinking about swift logs and realized we should just put our mirror in swift too | 03:13 |
mordred | how long have we been doing this? | 03:13 |
mordred | clarkb: totes | 03:13 |
clarkb | mordred: then we can manage a single index.html file | 03:13 |
clarkb | and maybe not even that | 03:13 |
clarkb | mordred: nova is requiring it for their hypervisors | 03:14 |
clarkb | mordred: I think ssh will always be the way to go for third party testing (because event stream > polling) | 03:15 |
*** wenlock_ has joined #openstack-infra | 03:18 | |
*** wenlock has quit IRC | 03:19 | |
*** wenlock_ is now known as wenlock | 03:19 | |
mordred | ++ | 03:20 |
mordred | amazing how russellb telling people they have to do it or they're going to get dropped gets further than us offering that they can do it and people can track the quality of their driver | 03:20 |
*** matsuhashi has joined #openstack-infra | 03:24 | |
*** matsuhashi has quit IRC | 03:31 | |
*** matsuhashi has joined #openstack-infra | 03:32 | |
*** guitarzan has quit IRC | 03:34 | |
*** alaski_ has quit IRC | 03:34 | |
*** dkranz has quit IRC | 03:34 | |
*** jhesketh has quit IRC | 03:34 | |
*** nosnos has quit IRC | 03:34 | |
*** michchap has quit IRC | 03:34 | |
*** uvirtbot has quit IRC | 03:34 | |
*** Ryan_Lane has quit IRC | 03:34 | |
*** SlickNik has quit IRC | 03:34 | |
*** freyes has quit IRC | 03:34 | |
*** mkoderer has quit IRC | 03:34 | |
*** slong has quit IRC | 03:34 | |
*** guitarzan has joined #openstack-infra | 03:34 | |
*** alaski has joined #openstack-infra | 03:34 | |
*** freyes has joined #openstack-infra | 03:34 | |
*** mkoderer_ has joined #openstack-infra | 03:34 | |
*** SlickNik has joined #openstack-infra | 03:34 | |
*** dkranz has joined #openstack-infra | 03:34 | |
*** slong has joined #openstack-infra | 03:34 | |
*** jhesketh has joined #openstack-infra | 03:34 | |
*** nosnos has joined #openstack-infra | 03:34 | |
*** michchap has joined #openstack-infra | 03:34 | |
*** Ryan_Lane has joined #openstack-infra | 03:35 | |
*** Ryan_Lane has quit IRC | 03:35 | |
*** Ryan_Lane has joined #openstack-infra | 03:35 | |
*** matsuhashi has quit IRC | 03:36 | |
*** senk has joined #openstack-infra | 03:41 | |
*** matsuhashi has joined #openstack-infra | 03:41 | |
*** matsuhashi has quit IRC | 03:41 | |
*** matsuhashi has joined #openstack-infra | 03:42 | |
*** senk has quit IRC | 03:45 | |
*** basha has joined #openstack-infra | 03:45 | |
*** matsuhashi has quit IRC | 03:46 | |
*** CaptTofu has quit IRC | 03:47 | |
*** CaptTofu has joined #openstack-infra | 03:48 | |
*** matsuhashi has joined #openstack-infra | 03:49 | |
*** basha_ has joined #openstack-infra | 03:50 | |
*** basha has quit IRC | 03:52 | |
*** basha_ is now known as basha | 03:52 | |
*** basha has quit IRC | 03:53 | |
*** SergeyLukjanov has joined #openstack-infra | 03:54 | |
*** jerryz has quit IRC | 04:01 | |
*** wenlock has quit IRC | 04:04 | |
*** sarob has joined #openstack-infra | 04:11 | |
*** erfanian has quit IRC | 04:14 | |
*** D30 has joined #openstack-infra | 04:20 | |
openstackgerrit | Tom Fifield proposed a change to openstack-infra/config: Fix Doc Location for Transifex https://review.openstack.org/51112 | 04:21 |
clarkb | fifieldt: you around? | 04:21 |
fifieldt | yessir clarkb | 04:22 |
fifieldt | the sun is up and doing well | 04:22 |
clarkb | fifieldt: cool. We would like to add Ironic to transifex and I figured I should figure out how you would like to go about adding new prjoects | 04:22 |
fifieldt | right, yes, that proceedure should be documented | 04:22 |
fifieldt | I take it you're most interested in the transifex side of things? | 04:23 |
clarkb | I think I have sufficient permissions to do it, but didn't want to be sidestepping things | 04:23 |
clarkb | fifieldt: right | 04:23 |
clarkb | fifieldt: I can send an email or submit a bug or whatever is best for you | 04:23 |
fifieldt | if you want, we can step through it now and just do it? | 04:23 |
clarkb | sure | 04:23 |
fifieldt | and I can update the wiki at the same time | 04:23 |
fifieldt | so, we start in the OpenStack "organisation" on transifex | 04:23 |
fifieldt | https://www.transifex.com/organization/openstack | 04:23 |
fifieldt | at the top of the projects list is the "+ NEW" button | 04:24 |
fifieldt | we type in a name, and description as appropriate | 04:24 |
clarkb | yup, I have clicked the NEW button | 04:24 |
fifieldt | and importantly: set the source language to English (en) | 04:24 |
clarkb | fifieldt: and the name is the project less openstack/ ? | 04:24 |
fifieldt | yes | 04:24 |
fifieldt | the openstack organisation provides the openstack bit | 04:25 |
fifieldt | choose "Permissive Open Source" as the license | 04:25 |
fifieldt | and paste the URL for the source (either github or git.openstack.org) in the "source code URL" box | 04:25 |
fifieldt | once you have created the project, go to its page and click the "Manage" button | 04:26 |
clarkb | fifieldt: does the URL for the source need to be a clonable path? | 04:26 |
clarkb | or is that just a handy link for humans? | 04:26 |
fifieldt | just a handy link for humans | 04:26 |
clarkb | ok I am on the manage page | 04:26 |
*** basha has joined #openstack-infra | 04:27 | |
fifieldt | feel free to fill out a long description, home page, if you want, | 04:27 |
fifieldt | but the important bit here is maintainers | 04:27 |
fifieldt | sorry | 04:27 |
fifieldt | not maintainers | 04:27 |
fifieldt | access control | 04:27 |
fifieldt | set the "Project Type" to "Outsourced project" | 04:27 |
clarkb | fifieldt: under features is a TM check box. should I check that? | 04:27 |
fifieldt | and "Outsource Access to" OpenStack | 04:27 |
fifieldt | yes, that is a good idea clarkb | 04:28 |
clarkb | ok TM check box checked and project outsourced to openstack | 04:28 |
fifieldt | great | 04:28 |
clarkb | now I need to add maintainers | 04:28 |
fifieldt | in theory, that is done through the OpenStack organisation | 04:29 |
clarkb | oh | 04:29 |
fifieldt | but you can add anyone you think is relevent to an individual project | 04:29 |
clarkb | fifieldt: can you check if you have management perms on Ironic? | 04:29 |
clarkb | you haven't been explicitly added but are part of the project hub | 04:29 |
fifieldt | I do indeed | 04:29 |
fifieldt | so no problems with permissions | 04:29 |
clarkb | cool I will leave it as is then | 04:29 |
fifieldt | yay :) | 04:29 |
clarkb | is that it for the transifex side? | 04:30 |
fifieldt | yes | 04:30 |
clarkb | awesome thanks | 04:30 |
fifieldt | well | 04:30 |
fifieldt | there is one thing I'm not 100% sure of | 04:30 |
fifieldt | that is whether there's a need to manually create the "Resources" the first time | 04:30 |
fifieldt | I think the client can do that | 04:30 |
fifieldt | but I'm not 100% sure | 04:30 |
clarkb | I think the client can do that too | 04:30 |
fifieldt | great | 04:30 |
fifieldt | then yes, that should be everything | 04:31 |
clarkb | as other new projects haven't needed to do anything under resources, instead jenkins jobs push to them and they are automagically added | 04:31 |
fifieldt | excellent | 04:31 |
fifieldt | it's good to get confirmation on that | 04:31 |
clarkb | fifieldt: I will try to remeber and double check ironic once the jenkins jobs are in place | 04:31 |
fifieldt | cheers | 04:31 |
clarkb | but I haven't heard complaining about it not working so it must work right? :) | 04:31 |
fifieldt | right :) | 04:32 |
fifieldt | https://review.openstack.org/#/c/51112 <-- though, speaking of failing jenkins jobs, how do you feel about this? :) I'd like to get manuals working again :( | 04:32 |
clarkb | devananda: ^ you are ready for the jenkins jobs | 04:32 |
clarkb | fifieldt: 51112 lgtm +2'd | 04:32 |
fifieldt | cheers | 04:32 |
* fifieldt wonders who else he can bother at this insane timezone | 04:33 | |
*** markmcclain has quit IRC | 04:38 | |
fifieldt | dammit clarkb, now I have to check every project to make sure that TM box is ticked :D | 04:40 |
fifieldt | it must be a new option | 04:42 |
fifieldt | they weren't | 04:42 |
fifieldt | nice job on the discovery :) | 04:42 |
*** senk has joined #openstack-infra | 04:42 | |
*** DennyZhang has quit IRC | 04:46 | |
*** senk has quit IRC | 04:47 | |
clarkb | fifieldt: :) | 04:47 |
*** changbl has quit IRC | 04:47 | |
*** changbl has joined #openstack-infra | 04:51 | |
*** DennyZhang has joined #openstack-infra | 04:56 | |
*** sarob has quit IRC | 05:02 | |
*** sarob has joined #openstack-infra | 05:02 | |
*** boris-42 has joined #openstack-infra | 05:04 | |
*** afazekas has joined #openstack-infra | 05:06 | |
*** sarob has quit IRC | 05:07 | |
*** SergeyLukjanov has quit IRC | 05:08 | |
*** afazekas has quit IRC | 05:11 | |
*** ryanpetrello has joined #openstack-infra | 05:17 | |
*** ryanpetrello has quit IRC | 05:18 | |
*** changbl has quit IRC | 05:38 | |
*** senk has joined #openstack-infra | 05:43 | |
*** senk has quit IRC | 05:48 | |
*** cody-somerville has quit IRC | 05:50 | |
*** yaguang has quit IRC | 05:50 | |
*** kong has quit IRC | 05:57 | |
*** Lingxian has joined #openstack-infra | 05:58 | |
openstackgerrit | Endre Karlson proposed a change to openstack-infra/config: Add pypi job to python-libraclient https://review.openstack.org/51069 | 06:05 |
*** DennyZhang has quit IRC | 06:08 | |
*** yolanda has joined #openstack-infra | 06:11 | |
*** sarob has joined #openstack-infra | 06:13 | |
openstackgerrit | Endre Karlson proposed a change to openstack-infra/config: Add / Change python-libraclient jobs https://review.openstack.org/51069 | 06:17 |
*** sarob has quit IRC | 06:18 | |
*** mkoderer_ is now known as mkoderer | 06:37 | |
*** senk has joined #openstack-infra | 06:45 | |
*** senk has quit IRC | 06:50 | |
*** uvirtbot has joined #openstack-infra | 06:52 | |
*** mkerrin has quit IRC | 06:55 | |
*** yamahata has joined #openstack-infra | 06:58 | |
*** mkerrin has joined #openstack-infra | 07:00 | |
*** cody-somerville has joined #openstack-infra | 07:09 | |
*** cody-somerville has quit IRC | 07:09 | |
*** cody-somerville has joined #openstack-infra | 07:09 | |
*** mancdaz_ has quit IRC | 07:14 | |
*** slong has quit IRC | 07:15 | |
*** mancdaz has joined #openstack-infra | 07:15 | |
openstackgerrit | Masashi Ozawa proposed a change to openstack/requirements: Set boto minimum version https://review.openstack.org/51131 | 07:15 |
*** cody-somerville has quit IRC | 07:16 | |
openstackgerrit | Masashi Ozawa proposed a change to openstack/requirements: Set boto minimum version https://review.openstack.org/51131 | 07:17 |
*** D30 has quit IRC | 07:17 | |
*** D30 has joined #openstack-infra | 07:22 | |
*** bauzas has joined #openstack-infra | 07:22 | |
bauzas | hi all | 07:22 |
bauzas | I'm having trouble with the py27 build for a review : http://logs.openstack.org/70/50970/1/check/gate-climate-python27/5ede61d/console.html | 07:23 |
bauzas | my own tow -r -epy27 works like a charm | 07:23 |
bauzas | s/tow/tox | 07:23 |
bauzas | but the oslo config on the Jenkins VM is incorrect | 07:24 |
bauzas | I checked both Jenkins and tox venvs | 07:24 |
bauzas | and the pip freeze is slighly different | 07:24 |
*** fbo_away is now known as fbo | 07:25 | |
bauzas | oslo.config is the same 1.2.1 | 07:25 |
bauzas | but I found trace of oslo-config on Jenkins | 07:26 |
*** osanchez has joined #openstack-infra | 07:26 | |
bauzas | which is an early build | 07:26 |
*** D30 has quit IRC | 07:27 | |
*** dafter has joined #openstack-infra | 07:29 | |
*** cody-somerville has joined #openstack-infra | 07:30 | |
*** shardy_afk is now known as shardy | 07:31 | |
*** D30 has joined #openstack-infra | 07:32 | |
*** cody-somerville has quit IRC | 07:37 | |
*** senk has joined #openstack-infra | 07:46 | |
*** senk has quit IRC | 07:51 | |
*** basha has quit IRC | 07:59 | |
*** che-arne has joined #openstack-infra | 08:01 | |
*** luhrs1 has joined #openstack-infra | 08:01 | |
*** jpich has joined #openstack-infra | 08:05 | |
*** dizquierdo has joined #openstack-infra | 08:06 | |
*** odyssey4me has joined #openstack-infra | 08:06 | |
*** yassine has joined #openstack-infra | 08:08 | |
*** amotoki has quit IRC | 08:11 | |
*** derekh has joined #openstack-infra | 08:18 | |
*** odyssey4me has quit IRC | 08:26 | |
*** markmc has joined #openstack-infra | 08:29 | |
*** odyssey4me has joined #openstack-infra | 08:33 | |
*** dizquierdo has quit IRC | 08:38 | |
*** hashar has joined #openstack-infra | 08:41 | |
*** dims has joined #openstack-infra | 08:42 | |
*** dkehn_ has joined #openstack-infra | 08:44 | |
*** yamahata has quit IRC | 08:46 | |
*** dkehn has quit IRC | 08:47 | |
*** senk has joined #openstack-infra | 08:47 | |
*** dims has quit IRC | 08:50 | |
*** senk has quit IRC | 08:52 | |
openstackgerrit | Lucas Alvares Gomes proposed a change to openstack/requirements: Added lower version boundary for netaddr https://review.openstack.org/49530 | 08:55 |
*** dizquierdo has joined #openstack-infra | 09:08 | |
openstackgerrit | Masashi Ozawa proposed a change to openstack/requirements: Set boto minimum version https://review.openstack.org/51131 | 09:12 |
sileht | thx fungi I have seen the pypi mirror updated ! | 09:13 |
*** johnthetubaguy has joined #openstack-infra | 09:19 | |
*** basha has joined #openstack-infra | 09:22 | |
*** johnthetubaguy has quit IRC | 09:31 | |
*** johnthetubaguy has joined #openstack-infra | 09:31 | |
*** beagles has joined #openstack-infra | 09:45 | |
openstackgerrit | Mehdi Abaakouk proposed a change to openstack-infra/jenkins-job-builder: Allow macro is dict key https://review.openstack.org/51159 | 09:46 |
*** alexpilotti has joined #openstack-infra | 09:46 | |
*** senk has joined #openstack-infra | 09:48 | |
*** senk has quit IRC | 09:52 | |
*** markmc has quit IRC | 10:02 | |
*** xchu has quit IRC | 10:04 | |
*** alexpilotti has joined #openstack-infra | 10:07 | |
*** pcm_ has joined #openstack-infra | 10:07 | |
*** pcm_ has quit IRC | 10:09 | |
*** pcm_ has joined #openstack-infra | 10:09 | |
*** markmc has joined #openstack-infra | 10:11 | |
*** D30 has quit IRC | 10:12 | |
* ttx juggles with CIVS since it doesn't allow more than 1000 voters | 10:17 | |
*** fifieldt has quit IRC | 10:18 | |
ttx | Fun fact: there is one voter that was left out by CIVS for a mysterious reason and I have no way of determining who it is. | 10:18 |
*** fifieldt has joined #openstack-infra | 10:18 | |
ttx | fungi, jeblair, mordred: multiple failures downloading deps on various jobs | 10:23 |
ttx | http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html | 10:24 |
ttx | looks like network issues | 10:24 |
ttx | doesn't hit the same dep every time | 10:24 |
* ttx lunches | 10:24 | |
sdague | ttx: it doesn't allow more than 1000 voters? | 10:26 |
*** branen has quit IRC | 10:30 | |
openstackgerrit | Endre Karlson proposed a change to openstack-infra/config: Add / Change python-libraclient jobs https://review.openstack.org/51069 | 10:31 |
*** dims has joined #openstack-infra | 10:34 | |
*** mestery has joined #openstack-infra | 10:40 | |
*** hashar_ has joined #openstack-infra | 10:46 | |
*** johnthetubaguy has quit IRC | 10:47 | |
*** johnthetubaguy has joined #openstack-infra | 10:48 | |
*** hashar has quit IRC | 10:48 | |
*** hashar_ is now known as hashar | 10:48 | |
*** mestery has quit IRC | 10:48 | |
*** senk has joined #openstack-infra | 10:49 | |
*** senk has quit IRC | 10:53 | |
*** boris-42 has quit IRC | 10:55 | |
*** guohliu has quit IRC | 11:01 | |
openstackgerrit | Qiu Yu proposed a change to openstack-infra/jeepyb: Print help message and exit if no config file by default https://review.openstack.org/51182 | 11:03 |
*** cody-somerville has joined #openstack-infra | 11:13 | |
soren | ttx: CIVS is free software, IIRC. You might be able to install it somewhere and crank that limit up to eleven... thousand. | 11:17 |
*** michchap has quit IRC | 11:26 | |
*** michchap has joined #openstack-infra | 11:26 | |
*** CaptTofu has quit IRC | 11:27 | |
*** CaptTofu has joined #openstack-infra | 11:27 | |
*** cody-somerville has quit IRC | 11:31 | |
sdague | might need that for next go around. the ATC growth being what it is | 11:32 |
*** hashar has quit IRC | 11:34 | |
ttx | soren: yes, it's a bit weird but I ran it locally recently to test the ability to rerun ballots with alternative algorithms | 11:35 |
ttx | sdague: you can actually send voters in multiple batches of <1000 | 11:35 |
sdague | ah, gotcha | 11:35 |
*** SergeyLukjanov has joined #openstack-infra | 11:35 | |
ttx | sdague: but i wasn't sure of that until I tried and already sent half of them :) | 11:36 |
sdague | heh | 11:37 |
*** SergeyLukjanov is now known as _SergeyLukjanov | 11:37 | |
*** _SergeyLukjanov is now known as SergeyLukjanov | 11:37 | |
openstackgerrit | Ekaterina Fedorova proposed a change to openstack-infra/config: Add murano-repository to stackforge https://review.openstack.org/50026 | 11:38 |
ttx | Err... the test nodes graph at http://status.openstack.org/zuul/ looks highly suspicious | 11:41 |
ttx | fungi, jeblair, mordred: ^ may or may not be related with the network issues we're experiencing fetching deps | 11:42 |
ttx | At this rate we'll reach universe entropy in 67 minutes | 11:42 |
ttx | sdague: ever saw something like it ? | 11:43 |
sdague | yeh, that looks crazy | 11:44 |
sdague | I wonder if the network timeouts are preventing the node builds | 11:44 |
sdague | which would make sense | 11:44 |
ttx | sdague: yes, definitely started to appear at around the same time | 11:44 |
sdague | so they enter that state, but stall out | 11:44 |
sdague | and the system is correctly trying to build more, because it's not getting any out the other side | 11:45 |
sdague | because we are definitely backed up on devstack nodes | 11:45 |
ttx | it's like watching a train wreck in slow motion | 11:45 |
ttx | good thing I got most of my patches merged earlier. | 11:46 |
sdague | heh | 11:46 |
sdague | who knew that skynet would need this much care and feeding | 11:46 |
ttx | sdague: I was thinking of issuing a statusbot alert. | 11:47 |
sdague | probably fair | 11:47 |
ttx | on it | 11:47 |
ttx | #status notice Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun) | 11:49 |
*** senk has joined #openstack-infra | 11:49 | |
ttx | I like how every time I need to use that bot it miserably fails | 11:50 |
ttx | where the heck is openstackstatus bot | 11:50 |
*** basha has quit IRC | 11:50 | |
*** senk has quit IRC | 11:53 | |
-ttx- Top issues right now: (1) test node starvation (2) networking issues fetching dep (might be the cause of 1) and (3) no statusbot to warn people | 11:54 | |
ttx | fungi, jeblair, mordred: ^ | 11:55 |
*** thomasm has joined #openstack-infra | 11:55 | |
*** thomasm has quit IRC | 11:55 | |
*** thomasm has joined #openstack-infra | 11:56 | |
openstackgerrit | Tom Fifield proposed a change to openstack-infra/config: Fix Doc Location for Transifex https://review.openstack.org/51112 | 11:56 |
ttx | sdague: wondering if we are not past the peak of network issues and starting to gradually recover | 11:58 |
ttx | looking at the graph and the status of the very few tests that run | 11:58 |
sdague | yeh, could be | 11:59 |
*** boris-42 has joined #openstack-infra | 11:59 | |
*** basha has joined #openstack-infra | 12:00 | |
sdague | so in one of the ways to make skynet smarter, I wonder if we should consider auto respooling checks that hit a network timeout | 12:01 |
ttx | sdague: that wouldn't make it smarter, but would certainly make it more resilient | 12:02 |
sdague | yeh | 12:02 |
*** basha has quit IRC | 12:02 | |
*** markmc has quit IRC | 12:02 | |
*** dizquierdo has quit IRC | 12:03 | |
*** w_ has joined #openstack-infra | 12:04 | |
*** markmc has joined #openstack-infra | 12:05 | |
*** CaptTofu has quit IRC | 12:05 | |
*** olaph has quit IRC | 12:05 | |
*** CaptTofu has joined #openstack-infra | 12:06 | |
*** dprince has joined #openstack-infra | 12:10 | |
*** cody-somerville has joined #openstack-infra | 12:22 | |
mordred | yay! things have fixed themselves before I woke up? | 12:23 |
BobBall | they knew you were coming | 12:23 |
BobBall | and were scared... | 12:23 |
mordred | BobBall: ++ | 12:24 |
thomasm | 'Tis a good day. | 12:26 |
*** thomasbiege has joined #openstack-infra | 12:30 | |
*** dcramer_ has quit IRC | 12:31 | |
*** adalbas has joined #openstack-infra | 12:32 | |
*** matsuhashi has quit IRC | 12:33 | |
*** matsuhashi has joined #openstack-infra | 12:34 | |
*** aspiers has quit IRC | 12:34 | |
*** nosnos has quit IRC | 12:35 | |
*** nosnos has joined #openstack-infra | 12:35 | |
*** aspiers has joined #openstack-infra | 12:38 | |
openstackgerrit | Roman Podolyaka proposed a change to openstack-infra/config: Fix sqlalchemy-migrate py26/sa07 job https://review.openstack.org/44686 | 12:38 |
*** matsuhashi has quit IRC | 12:38 | |
*** dafter has quit IRC | 12:40 | |
*** nosnos has quit IRC | 12:40 | |
*** dafter has joined #openstack-infra | 12:41 | |
ttx | mordred: no | 12:41 |
*** weshay has joined #openstack-infra | 12:41 | |
ttx | mordred: start by the scary "test nodes" graph @ http://status.openstack.org/zuul/ | 12:41 |
ttx | mordred: then look at download fails @ http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html | 12:42 |
ttx | (might be same issue around networking) | 12:42 |
*** hashar has joined #openstack-infra | 12:42 | |
ttx | mordred: then finally, where the heck is statusbot when you need it ? | 12:42 |
ttx | mordred: gate is totally wedged right now. | 12:43 |
fifieldt | that looks awesome | 12:44 |
bauzas | sdague: ping ? | 12:44 |
fifieldt | but the amount of scrolling was annoying to get to the graphs ;) | 12:44 |
ttx | fifieldt: if I didn't need it urgently for RC2 production I would probably find it funny too | 12:44 |
bauzas | sdague: I'm now at my office, still broken about my oslo.config version | 12:44 |
bauzas | btw, maybe ppl could help me ? | 12:45 |
fifieldt | sorry ttx :) 2345 here and the brain is off, it seems | 12:45 |
* ttx sees his weekend vanish | 12:45 | |
bauzas | http://logs.openstack.org/70/50970/1/check/gate-climate-python27/5ede61d/console.html | 12:45 |
bauzas | oslo-config got pulled from Jenkins while it shouldn't | 12:45 |
bauzas | my own tox venv on my laptop doesn't get this pretty old oslo-config beta version | 12:45 |
*** dafter has quit IRC | 12:46 | |
bauzas | the gate should be fine | 12:46 |
mordred | why are we timing out on fetches from pypi.o.o ? | 12:46 |
*** jhesketh has quit IRC | 12:46 | |
ttx | mordred: you tell me | 12:46 |
*** openstackstatus has joined #openstack-infra | 12:47 | |
mordred | ok. there's statusbot | 12:47 |
ttx | yay, a bot | 12:47 |
ttx | #status notice Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun) | 12:48 |
openstackstatus | NOTICE: Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun) | 12:48 |
*** basha has joined #openstack-infra | 12:48 | |
openstackgerrit | Renat Akhmerov proposed a change to openstack-infra/config: Add configuration for Mistral project https://review.openstack.org/51205 | 12:49 |
*** dhouck_ has joined #openstack-infra | 12:50 | |
* ttx goes to get some fresh air | 12:50 | |
openstackgerrit | Emilien Macchi proposed a change to openstack-infra/config: Add IRC bot on #openstack-rally for Gerrit changes https://review.openstack.org/51207 | 12:58 |
*** dkehn_ is now known as dkehn | 12:58 | |
*** basha has quit IRC | 12:58 | |
*** CaptTofu has quit IRC | 13:00 | |
*** CaptTofu has joined #openstack-infra | 13:00 | |
ttx | mordred: fwiw we might be past the peak of networking issues and slowly recovering | 13:02 |
yolanda | hi, i'm trying to create users automatically in gerrit, they are created, correctly assigned to groups, but when i click on their links (aka /#/dashboard/xxxx), it shows me a not found page, what could be the issue there? | 13:02 |
ttx | mordred: there are a few successful test runs by now, a few hours earlier they were all failing | 13:03 |
yolanda | i can see the /dashboard/ url for the logged user, but not for others, although i'm logged with an admin user | 13:03 |
ttx | mordred: hard to tell more from where I stand | 13:03 |
*** miqui has joined #openstack-infra | 13:05 | |
* mordred now useless and on the phone once more | 13:05 | |
*** blamar has joined #openstack-infra | 13:07 | |
*** michchap has quit IRC | 13:11 | |
*** michchap has joined #openstack-infra | 13:14 | |
*** michchap has quit IRC | 13:16 | |
*** julim has joined #openstack-infra | 13:16 | |
*** sandywalsh_ has quit IRC | 13:16 | |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Fix test_files_at_url_pass https://review.openstack.org/50706 | 13:16 |
*** DennyZhang has joined #openstack-infra | 13:16 | |
*** mriedem has joined #openstack-infra | 13:18 | |
*** basha has joined #openstack-infra | 13:19 | |
fungi | having a look | 13:22 |
ttx | fungi: network issues preventing dep fetching, potentially also the cause of test nodes starvation | 13:22 |
ttx | (executive summary) | 13:23 |
ttx | see scary "test nodes" graph @ http://status.openstack.org/zuul/ and example dep fetching fail @ http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html | 13:23 |
*** matty_dubs|gone is now known as matty_dubs | 13:25 | |
fungi | yeah, looking at graphs and checking rackspace's network status info | 13:25 |
*** thedodd has joined #openstack-infra | 13:27 | |
fungi | yeah, rs mentions no current issues and no posted maintenance for today | 13:27 |
*** sandywalsh has joined #openstack-infra | 13:28 | |
sdague | mordred: having an issue with the cookiecutter repo - http://paste.openstack.org/show/48266/ | 13:32 |
fungi | mmm, the /srv/static/doc filesystem on static.o.o is slap full. not sure whether that's having an impact but i'll give it a little more breathing room | 13:32 |
*** markmcclain has joined #openstack-infra | 13:33 | |
fungi | er, /srv/static/docs-draft (cacti truncated the label in its graph) | 13:33 |
fifieldt | there have been many more doc patches than normal of late | 13:33 |
*** russellb is now known as rustlebee | 13:36 | |
sdague | oh, never mind | 13:37 |
fungi | i'm increasing it by about 25% for now and then we can discuss whether we purge drafts more aggressively or add still more space | 13:37 |
sdague | mordred: it's probably a good idea to kill - https://github.com/emonty/cookiecutter-openstack as it is a high hit for openstack cookiecutter | 13:37 |
*** dafter has joined #openstack-infra | 13:38 | |
*** dafter has quit IRC | 13:38 | |
*** dafter has joined #openstack-infra | 13:38 | |
mordred | sdague: ++ | 13:40 |
*** dizquierdo has joined #openstack-infra | 13:41 | |
fungi | so, on the nodepool.o.o graphs i see gaps around the time the server building volume increases there on the graph. either it went to lunch and stopped responding to snmp for ~20 minutes or there was a network blip (but i don't find gaps from the same time period for other hosts) | 13:41 |
fungi | i'll start dinning into logs on the nodepool server | 13:41 |
*** bnemec is now known as beekneemech | 13:44 | |
fungi | there were some errors in the nodepool image log around 0230 utc, but that's way earlier than the symptoms began and i don't see a recurrence there | 13:45 |
ttx | fungi: is networking working now on those machines ? | 13:45 |
fungi | seems fine at the moment. i've got a ping test going to static.o.o right now as well as a few devstack slaves | 13:46 |
*** fifieldt has quit IRC | 13:47 | |
ttx | fungi: the test nodes building graph still goes up the roof | 13:48 |
*** basha has quit IRC | 13:48 | |
fungi | ttx: yeah, hoping the logs will give me some inkling of why | 13:48 |
*** beagles is now known as seagulls | 13:49 | |
ttx | fungi: our collective guess was that they stalled on dep loading | 13:49 |
ttx | fungi: the issue might be gone now but they are still rpeventing new ones from being spun | 13:49 |
*** aspiers has quit IRC | 13:49 | |
* ttx wonders how much radical killing would be a solution at this point | 13:49 | |
fungi | as of this week, nodepool will try to proactively build additional servers based on perceived demand for waiting jobs so that may be what we're seeing on the graphs | 13:49 |
ttx | all I can say is thet status has not moved in gate for the last 5 hours | 13:50 |
jd__ | ttx: are you threating fungi? ;) | 13:50 |
*** alaski is now known as lascii | 13:50 | |
fungi | but yes it could be a symptom of network issues in hpcloud, though i'm finding no evidence of that yet | 13:50 |
ttx | as in.. same jobs are waiting for resources | 13:51 |
ttx | so my guess is that no new test resources are made available. It's not slow, it's stuck | 13:51 |
*** dcramer_ has joined #openstack-infra | 13:51 | |
fungi | nodepool's "alien" list (servers it sees but didn't create) is fairly long. not sure if that's a related symptom | 13:52 |
*** prad_ has joined #openstack-infra | 13:52 | |
*** aspiers has joined #openstack-infra | 13:53 | |
fungi | but nodepool is definitely building and deleting servers based on what i see in its logs, and doesn't mention any serious issues so it could be a tuning problem | 13:53 |
ttx | fungi: there hasn't been a devstack being run that I could see in the last.. 4 hours now | 13:54 |
fungi | oh, nevermind. most of those are our other non-devstack slaves in rackspace | 13:54 |
ttx | right | 13:54 |
fungi | the alien nodes it lists, i mean | 13:54 |
ttx | (we still get pep8 tests run) | 13:54 |
*** pabelanger has joined #openstack-infra | 13:55 | |
ttx | 4h20min to be precise | 13:55 |
ttx | but the networking issue is gone in the latest non-devstack runs we see | 13:56 |
fungi | there are definitely still *some* devstack jobs running... https://jenkins01.openstack.org/job/check-tempest-devstack-vm-full/1954/console | 13:56 |
*** sarob has joined #openstack-infra | 13:57 | |
fungi | ahh, here we go | 13:57 |
ttx | fungi: yes, a dozen of them in the check line | 13:57 |
ttx | none in the gate | 13:57 |
*** sarob has joined #openstack-infra | 13:57 | |
fungi | there are no hpcloud slaves in jenkins, only rackspace | 13:58 |
fungi | that arms me with something more i can look for | 13:58 |
*** DennyZhang has quit IRC | 14:01 | |
ttx | sigh, looks like a busy saturday coming up for me | 14:01 |
*** marun has quit IRC | 14:02 | |
*** DennyZhang has joined #openstack-infra | 14:02 | |
jd__ | ttx: yup, i'll try to be available too if you want to handle Ceilometer RC2 tomorrow | 14:02 |
ttx | even if we solved it now the lines are so long I won't get the RC2 stuff in before eod | 14:02 |
fungi | http://paste.openstack.org/show/48272/ | 14:02 |
ttx | and I have family visiting, yay | 14:02 |
*** sarob has quit IRC | 14:02 | |
jd__ | 191 building? O_o is that normal? | 14:03 |
fungi | most hpcloud nodes are in a building state and many deleting with very few ready, similar to the pverall nodes graph on the zuul status page | 14:03 |
*** changbl has joined #openstack-infra | 14:03 | |
ttx | fungi: all our gate testing goes to hp nodes ? | 14:03 |
fungi | well, i know we've said in the past that we throw away something like 75% of the slaves we build on hpcloud because after waiting a couple minutes for them to boot they never show up | 14:03 |
*** SergeyLukjanov is now known as _SergeyLukjanov | 14:04 | |
fungi | ttx: yes, we have very few rackspace nodes (much lower quotas) and the slaves are slower | 14:04 |
*** _SergeyLukjanov is now known as SergeyLukjanov | 14:04 | |
ttx | maybe HP asked all their servers to work back in their datacenters | 14:04 |
fungi | lokking to see if i can figure out what's up with hpcloud and hopefully we can get this back on track | 14:04 |
* ttx hesitates to cut new RC2s right now, fearing that pre-release jobs would get queued forever | 14:06 | |
*** hashar has quit IRC | 14:11 | |
*** yassine has quit IRC | 14:13 | |
fungi | #status alert The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC | 14:14 |
openstackstatus | NOTICE: The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC | 14:14 |
*** ChanServ changes topic to "The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC" | 14:14 | |
mordred | fungi: wow. we have no HP nodes? | 14:14 |
fungi | mordred: well, we have a ton, but... we're not using them | 14:14 |
mordred | hrm | 14:14 |
fungi | mordred: http://paste.openstack.org/show/48273/ | 14:15 |
fungi | i'm hunting for any real error to explain it | 14:15 |
fungi | we have a handful of expected errors in the nodepool log for things like timeouts deleting servers, but they're few and far between and not for several hours now | 14:18 |
fungi | stuff that gets retried and would have errored again if it kept happening | 14:18 |
annegentle_ | node starvation sounds serious! Rooting for you guys. | 14:20 |
fungi | annegentle_: thanks! | 14:21 |
*** rahmu has quit IRC | 14:24 | |
fungi | the handful of devstack servers nodepool knows about in hpcloud are also not getting used. i just ssh'd into one of them and it had an uptime of 20 hours | 14:25 |
fungi | ssh'd into one in a deleting state and it's got an uptime of 6 hours. i suspect delete (and maybe build?) calls are not being respected | 14:26 |
*** yassine has joined #openstack-infra | 14:28 | |
*** blamar has quit IRC | 14:28 | |
fungi | novaclient itself shows the same server as "active" state | 14:28 |
mordred | fungi: fantastic | 14:29 |
mordred | fungi: anything I can do to help? | 14:29 |
* mordred is off phone now | 14:29 | |
fungi | no idea. poke at things. i'm still casting my net wide | 14:30 |
fungi | doing a nova delete of that "deleting" node seems to work | 14:30 |
fungi | and nodepool is still listing that node in a "delete" state even after it's gone in hpcloud | 14:31 |
mordred | fungi: that sounds very weird | 14:31 |
fungi | maybe nodepool just hasn't noticed yet? (or maybe it doesn't expect anyone else to delete its nodes) | 14:32 |
fungi | anyway, since none of the devstack-gate slaves in hpcloud are currently being used, i'm thinking maybe we delete them all and... nodepool is stateless, right? just restart it? | 14:33 |
*** ruhe has joined #openstack-infra | 14:35 | |
fungi | but i'm uneasy going behind its back and making changes, restarting it and losing state which might help point us to the actual error, et cetera | 14:35 |
mordred | yeah. I'm very shaky on doing things to nodepool without jeblair | 14:40 |
*** wenlock has joined #openstack-infra | 14:40 | |
*** datsun180b has joined #openstack-infra | 14:41 | |
fungi | i only just noticed that one of the columns in nodepool list's output is age in hours. quite a few of the "building" nodes have an age over 4 hours | 14:41 |
fungi | those are the oldest in that state and that's about the timeframe where we started seeing issues, judging from the graphs | 14:42 |
*** cody-somerville has quit IRC | 14:43 | |
fungi | actually some almost 6 hours old | 14:44 |
fungi | around 0850 utc | 14:44 |
fungi | all the nodes in a nodepool delete state are not much older than that. maybe 40 minutes older, tops | 14:46 |
fungi | from around 0825 | 14:46 |
*** cody-somerville has joined #openstack-infra | 14:46 | |
*** pentameter has joined #openstack-infra | 14:47 | |
openstackgerrit | Masashi Ozawa proposed a change to openstack/requirements: Set boto minimum version https://review.openstack.org/51131 | 14:48 |
fungi | so i think starting around thenish, hpcloud ceased acting on any nova delete or create calls. maybe nodepool lost a persistent connection and didn't realize/retry? | 14:48 |
fungi | it has established https sockets (so it thinks) to addresses very similar to what the hpcloud service endpoint resolves to. sniffing now to see if those are actually dead connections | 14:52 |
openstackgerrit | Qiu Yu proposed a change to openstack-infra/jeepyb: Print help message and exit if no config file by default https://review.openstack.org/51182 | 14:52 |
fungi | it's been a couple minutes already and i see no traffic at all to/from those addresses | 14:53 |
*** rcleere has joined #openstack-infra | 14:54 | |
*** DennyZhang has quit IRC | 14:54 | |
*** basha has joined #openstack-infra | 14:55 | |
fungi | oho, so around 0850 nodepool *did* log this gem... ConnectionError: HTTPSConnectionPool(host='ord.servers.api.rackspacecloud.com', port=443): Max retries exceeded with url: /v2/637776/servers/a14b333a-9b03-48c8-b144-4f21a3eec405 (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer) | 14:55 |
fungi | nevermind. that was rackspace | 14:55 |
fungi | pretty large uptick in ssh timeouts waiting for servers to launch around that timeframe too | 14:58 |
*** dansmith is now known as Steely_Dan | 14:59 | |
mordred | spectacular | 15:00 |
mordred | so have we perhaps exceeded another timeout threshold? | 15:00 |
fungi | not sure. also i've been sniffing for any traffic to/from 168.87.243.0/24 (where the hpcloud service endpoint resolves into and where nodepool claims to have a couple established https sockets to remote systems) and so far not a single packet for over 10 minutes | 15:02 |
fungi | probably much, much longer, but at least none since i started up tcpdump | 15:02 |
*** boris-42 has quit IRC | 15:03 | |
*** basha has quit IRC | 15:04 | |
dkranz | mordred: I have a process going that is reading the console log for every successful tempest run (listening to gerrit) looking for reported bogus errors. Is that going to annoy any one? | 15:05 |
*** blamar has joined #openstack-infra | 15:05 | |
*** cody-somerville has quit IRC | 15:05 | |
fungi | dkranz: unlikely. are you pulling those console logs from logs.openstack.org? | 15:06 |
dkranz | fungi: Yes. | 15:06 |
fungi | i didn't notice any huge uptick in outbound network utilization on it at any rate | 15:07 |
dkranz | fungi: ok, cool. Just don't want to be a bad citizen... | 15:07 |
dkranz | fungi: This will stop once we start failing builds that have bogus errors (or real ones). | 15:07 |
fungi | dkranz: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=311&rra_id=all | 15:07 |
*** thingee_zzz is now known as thingee | 15:08 | |
fungi | i mean, yes, it's a lot of traffic but it's not at worrying levels, i don't think | 15:08 |
sdague | fungi / clarkb I'm respinning the htmlify-screen-logs.py into an os_loganalyze repository so I can do some sane test additions before upping the complexity for other log times | 15:09 |
sdague | log types | 15:09 |
fungi | though we do seem to have topped out at 100mbps briefly last week | 15:09 |
sdague | should I just github this until good, then pull it back into openstack-infra? | 15:09 |
fungi | sdague: whatever's easy for you. we can import later or you can start off with a basic cookiecutter | 15:09 |
sdague | or would we want it as a gerrit repo earlier | 15:09 |
sdague | fungi: yeh, I started with a cookiecutter | 15:09 |
*** yassine has quit IRC | 15:10 | |
sdague | so it should play nicely later | 15:10 |
fungi | i mean you can start out with your basic cookiecutter in gerrit or you can import it once it's usable--your call | 15:10 |
sdague | I was amused the cookiecutter has 3 pep8 errors init | 15:10 |
fungi | patches welcome! | 15:10 |
sdague | yeh, I guess it's probably faster to get to working unit tests with me just committing and pushing | 15:10 |
*** mkerrin has quit IRC | 15:11 | |
jeblair | sdague: have you kept up with the -infra thread on log storing/serving? | 15:12 |
sdague | not as much as I probably should | 15:13 |
ttx | jeblair! | 15:13 |
sdague | sorry, it's been one of those weeks | 15:13 |
jeblair | sdague: short: there's an idea that we might want to preprocess logs and statically serve them instead of using the wsgi app | 15:13 |
sdague | jeblair: ok | 15:13 |
jeblair | sdague: i don't think that invalidates any of your past or planned work, but if we decide to go that way, it may change how we use it a bit | 15:13 |
sdague | so that wouldn't let us do the filtering that we're doing now, which is nice | 15:13 |
ttx | jeblair: in case you're in "holy batman, what a backlog" mode, we are currently out of HP devstack node, effectively blocking the gate.. for the last 6 hours | 15:14 |
sdague | the filtering being nice, that is | 15:14 |
fungi | jeblair: thoughts on why nodepool is not talking to hpcloud since around 0825 utc (that's the best i've been able to pin details down so far) | 15:14 |
jeblair | sdague: i think we'd only do that if we found a way to accomplish the goals we get by filtering; anyway, your input is very welcome. | 15:14 |
jeblair | sdague: it's all a bit brainstormy right now -- nothing urgent | 15:14 |
jeblair | fungi: i'll go look | 15:14 |
sdague | sure, we going to do a session in HK? | 15:15 |
ttx | jeblair: may or may not be related to network failures we noticed in test jobs fetching deps around the same time (which seem to be fixed now) | 15:15 |
jeblair | sdague: lets | 15:15 |
jeblair | fungi: nodepool has _extensive_ logging | 15:15 |
sdague | that would be good brainstormy time for it. Right now I'd just like to get this to a realm where we aren't dropping all the keystone logs for logstash :) | 15:15 |
fungi | jeblair: yes, i've been trying to make sense of the logging and correlate it to the behavior we're seeing | 15:15 |
jeblair | sdague: yeah; one of the participants on the thread isn't going to make it to hk, so i'm trying to lay some groundwork over email | 15:16 |
sdague | yep, no worries | 15:16 |
sdague | who we going to miss in HK? | 15:16 |
jeblair | sdague: jhesketh | 15:16 |
sdague | ok, gotcha | 15:16 |
fungi | it's not saying things like "i'm trying to build servers and they're never appearing (in fact it's not saying much of all--i think it's waiting for hours for them to become ready) | 15:16 |
ttx | fungi: could the networking issues have caused permanent damage to that nodepool/HPcloud link ? | 15:16 |
jeblair | fungi: yeah, it looks like they're all stuck in building, and errored out in such a way that the cleanup code failed | 15:17 |
fungi | ttx: i'm not sure. i'm thinking maybe the tcp sockets to the service endpoint are actually dead and the nodepool server still believes them to be in an established state | 15:17 |
*** sandywalsh has quit IRC | 15:17 | |
jeblair | fungi: if you run 'nodepool list' you'll see a lot of 'None' values in th edb | 15:17 |
fungi | right, i definitely saw that | 15:17 |
jeblair | like this: http://paste.openstack.org/show/48279/ | 15:18 |
fungi | i also saw the periodic cleanup error, but i expected it to periodically error if it was continuing to have problems, being a periodic cleanup | 15:18 |
fungi | however, it only complained once, then was silent | 15:18 |
jeblair | fungi: so for some reason, we set the cleanup delay for non-ready to 8 hours | 15:18 |
jeblair | fungi: so it's going to wait another 2 hours before it starts deleting these | 15:19 |
jeblair | so good news: it would probably fix itself in 2 hours. :) | 15:19 |
jeblair | we should probably adjust that timing a bit. | 15:19 |
fungi | got it. would have fixed itself while we slept if only it had started sooner | 15:19 |
fungi | so what's the safest way to manually clean those up in the future? | 15:20 |
jeblair | fungi: oh, i may have been wrong -- it may not have failed, you might be right -- it may actually have a couple hundred threads waiting for something | 15:20 |
fungi | delete queries in the db? | 15:20 |
jeblair | i want to spend a minute and try to find out if that's the case | 15:20 |
fungi | certainly. i was very hesitant to disturb its current state lest i lose valuable evidence of the issue | 15:21 |
jeblair | fungi: do you have any logged errors handy? | 15:21 |
fungi | jeblair: not pasted yet, but i can do that | 15:21 |
*** branen has joined #openstack-infra | 15:22 | |
jeblair | ttx: can you tell me about the networking issues? | 15:23 |
*** beekneemech has quit IRC | 15:23 | |
*** sandywalsh has joined #openstack-infra | 15:24 | |
*** bnemec has joined #openstack-infra | 15:24 | |
ttx | jeblair: most tests suddenly started to fail with dep download errors like http://logs.openstack.org/45/51145/1/check/gate-heat-pep8/727444a/console.html | 15:24 |
fungi | jeblair: nodepool tracebacks in the log from around the time this started (though quieted down to some ssh timeout errors and then nothing of note for hours): http://paste.openstack.org/show/48280/ | 15:25 |
*** bnemec has quit IRC | 15:25 | |
ttx | at around the same time, the "test nodes" graph on status/zuul started to drink heavily | 15:25 |
jeblair | i note that we have servers stuck in build from both rax and hpcloud | 15:26 |
jeblair | the 'waiting for deletion' timeouts are mostly for hpcloud, but there's one rax | 15:27 |
ttx | jeblair: the starvation only appears to affect devstack/gate nodes | 15:27 |
*** markmcclain has quit IRC | 15:28 | |
jeblair | i don't see anything on rax status, and nothing relevant on hpcloud status | 15:28 |
fungi | jeblair: it might have been a network disruption local to the nodepool server | 15:30 |
fungi | i saw a gap in its cacti graphs from that timeperiod, but couldn't correlate it to any other systems | 15:30 |
*** rnirmal has joined #openstack-infra | 15:32 | |
jeblair | gdb says all the threads are sitting in sem_wait | 15:33 |
jeblair | (well, most of them) | 15:33 |
*** anteaya has joined #openstack-infra | 15:33 | |
*** markmc has quit IRC | 15:33 | |
jeblair | which is really weird because the one locking thing nodepool does is to use queue.Queue which handles all the locking internally | 15:34 |
fungi | though the gap is actually a little later than the logged errors... seeing it span 0915-0925 roughly while we were seeing deletion and ssh errors in the nodepool log an hour prior | 15:34 |
jeblair | ok, so i think i want to do the following: add a thread-dump handler to nodepool like zuul has | 15:35 |
jeblair | consider using dequeue instead of queue | 15:35 |
jeblair | i think the immediate cause of this may be a mystery for now | 15:35 |
jeblair | but if it happens again, hopefully the thread dump handler will help | 15:36 |
fungi | at least we know where to focus debugging the next time this happens, and possibly minimize the disruption as well | 15:36 |
*** markmc has joined #openstack-infra | 15:36 | |
jeblair | yeah. my thinking is that it's either a thread-related bug (which is really weird because that's hard to imagine except for a bug in the stdlib) | 15:36 |
jeblair | or it could be a novaclient bug, where all of the novaclient client objects are stuck doing something | 15:37 |
jeblair | (which may have been triggered by the host/network weirdness) | 15:37 |
jeblair | so, for cleanup: | 15:37 |
fungi | and then the manual cleanup for now is, what, shut down nodepool, run a delete query to remove any machines in a building or delete state manually and nova delete any of the failed deletes themselves, then start nodepool again? | 15:38 |
jeblair | fungi: close | 15:38 |
jeblair | fungi: i'd get the list of machines we want to delete from nodepool, restart it, then 'nodepool delete' each of them | 15:38 |
fungi | oh, that's nicer | 15:39 |
jeblair | fungi: (nodepool should be capable of deleting anything it has a record for) | 15:39 |
fungi | will nodepool delete work on aliens too? | 15:39 |
fungi | i guess not, since no record | 15:39 |
jeblair | fungi: then we can also use nodepool alien-list to get the others, and unfortunately no, we'll have to nova delete those | 15:39 |
fungi | that's easy enough | 15:39 |
fungi | okay, i can tackle that while you get to breakfast | 15:40 |
jeblair | fungi: how did you know? :) | 15:40 |
jeblair | fungi: nodepool list |grep building|awk '{print $2}' | 15:40 |
fungi | heh | 15:40 |
jeblair | fungi: is very handy | 15:40 |
jeblair | fungi: nodepool list |grep building|awk '{print "nodepool delete " $2}' | 15:40 |
jeblair | fungi: actually that's even handier | 15:40 |
fungi | yup. i was using cut to a similar effect, but maybe a slightly more machine-parsable format option would be a nice furture addition | 15:41 |
fungi | s/furture/future/ | 15:41 |
jeblair | fungi: i'd recommend taking that list and splitting it into about 5 parts or so, and then background 5 scripts running through that | 15:41 |
*** blamar has quit IRC | 15:41 | |
jeblair | fungi: to balance speed vs likelyhood of hitting an api rate limit | 15:41 |
fungi | yeah, don't want to get throttled | 15:41 |
fungi | right | 15:41 |
*** blamar has joined #openstack-infra | 15:42 | |
fungi | okay, shutting down nodepool now and getting started on that unless you need anything else from the running process first | 15:42 |
sdague | fungi: cookiecutter question ... why does this pass tests - https://github.com/sdague/os_loganalyze | 15:42 |
sdague | I set up an assertTrue(False) in there to ensure it broke correctly, and no dice | 15:42 |
jeblair | fungi: nope, go for it; i'd go ahead and restart nodepool immediately though so it can better keep up with the still-running check nodes | 15:43 |
fungi | got it--will do jeblair | 15:43 |
* ttx will get drunk now to forget he'll have to work over the weekend to catch up | 15:44 | |
*** sarob has joined #openstack-infra | 15:45 | |
*** amotoki has joined #openstack-infra | 15:45 | |
sdague | ttx: hopefully with some nice wine | 15:46 |
*** ruhe has quit IRC | 15:46 | |
jeblair | fungi: i can start work on the alien deletions if you want | 15:46 |
anteaya | ttx keep that Hawaiian shirt handy, you never know | 15:46 |
fungi | jeblair: sure, i've got the building deletions going now | 15:47 |
*** Steely_Dan is now known as Steely_Spam | 15:48 | |
jeblair | fungi: cool i'm on it then | 15:48 |
fungi | 5 separate batches of ~50 each | 15:48 |
pabelanger | So, should I expect tox to run properly after I use cookiecutter of the first time? | 15:48 |
fungi | pabelanger: you and sdague seem to possibly be asking the same question | 15:48 |
* ttx will bbl | 15:48 | |
pabelanger | fungi, okay cool | 15:49 |
pabelanger | I think it missed setting up versioning | 15:49 |
pabelanger | for defaulting to something | 15:49 |
fungi | if you don't figure it out among yourselves shortly, i'll have a look once i wrap up the current firefight | 15:49 |
sdague | fungi: so my issue is actually subunit discover doesn't seem to find any tests | 15:50 |
sdague | and "passes" because of it | 15:50 |
*** alcabrera has joined #openstack-infra | 15:50 | |
fungi | sdague: hrm, maybe the search path in the tox.ini is too strict? | 15:50 |
sdague | I don't think so | 15:51 |
sdague | if I manually venv, and run | 15:51 |
sdague | ./bin/python -m subunit.run discover -t ./ . --list | 15:51 |
sdague | nothing | 15:51 |
fungi | the zuul test nodes status graph seems to reflect things are on their way to recovery | 15:51 |
fungi | and i do see some jobs going in the gate queue now | 15:51 |
*** sandywalsh has quit IRC | 15:53 | |
sdague | ok, off to lunch | 15:53 |
*** sandywalsh has joined #openstack-infra | 15:53 | |
*** sandywalsh has quit IRC | 15:54 | |
mordred | sdague, what's going on with discover? | 15:54 |
mordred | and I see code? | 15:54 |
fungi | mordred: his repo is https://github.com/sdague/os_loganalyze | 15:54 |
jeblair | fungi: aliens deleted | 15:54 |
fungi | jeblair: thanks! | 15:54 |
fungi | the building ones are deleted now too | 15:54 |
*** cody-somerville has joined #openstack-infra | 15:55 | |
mordred | sdague: os_loganalyze/tests/ is missing a __init__.pyfile | 15:55 |
jeblair | fungi: you probably want to 'nodepool delete' the ones in delete state as well, to speed things up | 15:56 |
fungi | jeblair: or at least they should be deleted but i see "building" state nodes in the nodepool list output with an age >7 hours still | 15:56 |
jeblair | fungi: hrm | 15:56 |
openstackgerrit | Monty Taylor proposed a change to openstack-dev/cookiecutter: Actually git add the __init__.py file https://review.openstack.org/51238 | 15:57 |
*** sandywalsh has joined #openstack-infra | 15:57 | |
mordred | sdague, fungi ^^ | 15:57 |
pabelanger | http://pastebin.com/YeMM1kiK | 15:57 |
jeblair | fungi: i just deleted one and it went away | 15:57 |
pabelanger | that's my error for cookiecutter | 15:57 |
*** sandywalsh has quit IRC | 15:57 | |
fungi | jeblair: nevermind--i think at least one of my delete jobs was hung in the background for a moment | 15:57 |
mordred | pabelanger: ah! so, you need to make it a git repo and actually commit the first commit before that will work | 15:58 |
pabelanger | mordred, Ah, I see | 15:58 |
pabelanger | okay | 15:58 |
*** sandywalsh has joined #openstack-infra | 15:58 | |
mordred | sorry, I keep meaning to hack something in so that it will a) do that for you or b) print a warning | 15:58 |
mordred | pabelanger: also - see the note to sdague above | 15:58 |
pabelanger | mordred, roger | 15:58 |
mordred | pabelanger: I forgot to git add a file :) | 15:58 |
fungi | jeblair: or not. the background jobs did all finish like i thought, i'm just getting output from nodepool on my terminal after restarting it. may not properly close its file descriptors? | 15:58 |
jeblair | fungi: never seen that | 15:59 |
jeblair | fungi: ps suggests there's at least one delete script going | 15:59 |
*** bnemec has joined #openstack-infra | 15:59 | |
fungi | huh. jobs does not list it | 15:59 |
fungi | oh, yes it does actually | 16:00 |
fungi | okay, so it's still churning apparently. may have gotten throttled after all | 16:00 |
fungi | it went silent for several minutes there | 16:01 |
jeblair | ah | 16:01 |
fungi | now it's done | 16:02 |
annegentle_ | way to go looks like you turned a corner! http://bit.ly/1afAl8w | 16:02 |
jeblair | fungi: btw, errors about '2249297' are my fault, btw | 16:02 |
fungi | and yes, no building nodes older than 15 minutes now | 16:02 |
annegentle_ | some days I just want to be cheerleader bystander but deadlines keep getting in the way | 16:02 |
fungi | jeblair: okay, noted | 16:02 |
jeblair | fungi: i accidentally nova deleted it, but it really was building; | 16:02 |
fungi | k | 16:02 |
mordred | annegentle_: that's a sexy graph! | 16:03 |
fungi | i've got a round of 5 parallel scripts deleting the "delete" state nodes now | 16:04 |
clarkb | morning | 16:06 |
jeblair | clarkb: impeccable timing! :) | 16:07 |
pabelanger | okay cool | 16:07 |
clarkb | jeblair: looks like it | 16:07 |
pabelanger | flake8 a little sad | 16:07 |
pabelanger | but that is okay for now | 16:07 |
clarkb | bauzas: are you running tox with -r locally and are you running that in a clean git checkout? | 16:08 |
*** SergeyLukjanov is now known as _SergeyLukjanov | 16:10 | |
*** _SergeyLukjanov has quit IRC | 16:10 | |
clarkb | jeblair: fungi: so nodepool was having a hard time with the hpcloud endpoints? | 16:11 |
clarkb | but is all better now? | 16:11 |
fungi | clarkb: that was my earlier theory, but no longer suspect that to be the case | 16:11 |
fungi | jeblair did some investigation in a debugger, found a possible deadlock but without a thread dump it was hard to pinpoint the contention | 16:12 |
clarkb | I see. Does nodepool need to the zuul threaddump signal catcher? | 16:13 |
fungi | basically, that was his suggestion | 16:13 |
clarkb | that should be easy to port over. I can poke at it later | 16:13 |
jeblair | clarkb: i've got it -- almost done | 16:15 |
clarkb | cool | 16:16 |
*** matty_dubs is now known as matty_dubs|lunch | 16:16 | |
pabelanger | mordred, looks like something in the import process is messing up with flake8 | 16:16 |
pabelanger | http://pastebin.com/xxZqgJ0T | 16:16 |
pabelanger | http://pastebin.com/PVB4wDQp | 16:17 |
clarkb | pabelanger: the no newline at end of file? I think mordred filed a bug against upstream about that | 16:17 |
pabelanger | okay cool | 16:17 |
pabelanger | that was about the only other thing I see about tox being unhappy\ | 16:18 |
mordred | yup. I have an upstream PR up | 16:18 |
clarkb | jeblair: I think https://review.openstack.org/#/c/42393/ can probably be approved if you are happy with it | 16:19 |
clarkb | mordred: https://review.openstack.org/#/c/33926/ I haven't approved that becuase I don't have time to babysit (eg check results of change before next gerrit restart) | 16:19 |
clarkb | mordred: but if you do, feel free to approve | 16:19 |
clarkb | jeblair: https://review.openstack.org/#/c/45294/ has comments for you as well | 16:20 |
mordred | clarkb: same here | 16:21 |
pabelanger | jeblair, hope to get back into nodepool reviews today | 16:21 |
mordred | clarkb: I think when I approve that, I'll run a manual puppet agent --test on review.o.o and watch the patch output (should be null-ish) | 16:21 |
*** bnemec is now known as beekneemech | 16:22 | |
*** odyssey4me2 has joined #openstack-infra | 16:22 | |
* fungi finally had a moment to put on his "i voted" sticker | 16:23 | |
*** odyssey4me is now known as Guest36051 | 16:23 | |
*** odyssey4me2 is now known as odyssey4me | 16:23 | |
mordred | clarkb: https://review.openstack.org/#/c/48355/ could use a look from you or fungi - you guys had great comments last time | 16:25 |
*** markmc has quit IRC | 16:25 | |
*** _david_ has joined #openstack-infra | 16:27 | |
_david_ | zaro, ping | 16:27 |
clarkb | _david_: zaro isn't around this week | 16:28 |
_david_ | clarkb, thx, i fixed his patch, and wanted to ask if he can test it? | 16:28 |
_david_ | https://gerrit-review.googlesource.com/#/c/48254/ | 16:29 |
*** enikanorov-w has quit IRC | 16:29 | |
_david_ | clarkb, i tested upgrade to Gerrit schema 85 and now a permission can be granted to new system group "Change Owner". | 16:30 |
*** blamar has quit IRC | 16:30 | |
_david_ | jeblair, mordred, clarkb we host wip-plugin on gerit-review | 16:31 |
_david_ | git clone https://gerrit.googlesource.com/plugins/wip | 16:31 |
*** markmcclain has joined #openstack-infra | 16:31 | |
*** derekh has quit IRC | 16:32 | |
clarkb | cool | 16:32 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Add a thread dump signal handler https://review.openstack.org/51248 | 16:33 |
anteaya | fungi: yay! | 16:34 |
*** anteaya has quit IRC | 16:34 | |
jeblair | clarkb, fungi: fungi may still be right -- it's possible that it had a hard time with the hpcloud endpoints which caused a bug. i don't know if it was a deadlock or not; it's really hard to say. | 16:34 |
jeblair | clarkb, fungi: it's entirely possible that we were just sitting in a novaclient call, forever. | 16:35 |
fungi | #status ok the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue | 16:36 |
openstackstatus | NOTICE: the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue | 16:36 |
*** ChanServ changes topic to "Discussion of OpenStack Project Infrastructure | Docs http://ci.openstack.org/ | Bugs https://launchpad.net/openstack-ci | Code https://git.openstack.org/cgit/openstack-infra/" | 16:36 | |
*** mrodden has quit IRC | 16:36 | |
jeblair | mordred, clarkb: https://review.openstack.org/#/c/45294/ | 16:36 |
jeblair | mordred, clarkb: i don't care how stackforge projects do project management | 16:37 |
*** dkehn_ has joined #openstack-infra | 16:37 | |
jeblair | mordred: but i do care that if people have tag permissions and don't know how to use them, then we get called in to clean it up, which is complicated, takes time, and is not scalable | 16:37 |
*** jpich has quit IRC | 16:38 | |
*** dkehn has quit IRC | 16:38 | |
jeblair | mordred: so i think the best compromise between full access and no access to push tags, is that we ask that they limit the group of people who can tag to a small set who fully understand the process | 16:38 |
jeblair | mordred: is that unreasonable? | 16:38 |
mordred | jeblair: I don't think it's unreasonable, - but in this case they're asking for the group to match the group that has tag access for libra itself | 16:41 |
mordred | jeblair: so, effectively, I believe it is the thing you are asking for, AIUI | 16:41 |
mordred | LinuxJedi: ^^ right? do I grok? | 16:41 |
jeblair | mordred: that's fine then. | 16:42 |
*** thomasbiege has quit IRC | 16:42 | |
LinuxJedi | mordred: yep | 16:42 |
LinuxJedi | mordred: which is really small anyone, and only the people that do it now | 16:42 |
mordred | excellent | 16:42 |
*** _david_ has quit IRC | 16:42 | |
jeblair | LinuxJedi: the think i wanted to ensure is that it's a small group that understands the process/dangers. sounds like that's the case. thanks. | 16:43 |
*** _david_ has joined #openstack-infra | 16:43 | |
LinuxJedi | jeblair: oh hell yes. That groups is only me Shrews, marcp and pcrews. We are the only ones that would do tagging | 16:44 |
jeblair | i have +2d | 16:44 |
Shrews | LinuxJedi: Actually, only you and I are part of the -milestone group | 16:44 |
LinuxJedi | even better | 16:44 |
_david_ | clarkb, can you ask zaro to test that patch? | 16:45 |
_david_ | because Gerrit maintainer would like to cut stable-2.8 | 16:45 |
clarkb | _david_: yes, I will let him know when he is back | 16:45 |
_david_ | clarkb, weekend? | 16:45 |
clarkb | _david_: oh, well he is AFK until monday iirc | 16:45 |
*** thomasbiege has joined #openstack-infra | 16:46 | |
LinuxJedi | mordred: maybe a future release for gerrit/git-review should be to have a code review system for tags if there are worries. | 16:46 |
clarkb | LinuxJedi: yes! that would be awesome | 16:46 |
*** mrodden has joined #openstack-infra | 16:47 | |
*** hogepodge has joined #openstack-infra | 16:47 | |
*** dkehn has joined #openstack-infra | 16:52 | |
*** _david_ has left #openstack-infra | 16:52 | |
*** dkehn_ has quit IRC | 16:54 | |
*** matty_dubs|lunch is now known as matty_dubs | 16:54 | |
openstackgerrit | A change was merged to openstack-infra/config: Remove tuskarclient pylint job. https://review.openstack.org/49965 | 16:56 |
*** Ryan_Lane has quit IRC | 16:58 | |
*** dkehn_ has joined #openstack-infra | 17:01 | |
*** dkehn has quit IRC | 17:02 | |
mordred | LinuxJedi: yes. we're actually planning that ish | 17:03 |
mordred | LinuxJedi: or, a tool that lets you do "please make a new minor release for me" | 17:03 |
*** rahmu has joined #openstack-infra | 17:03 | |
mordred | LinuxJedi: so it knows how to find your current version, logically increment the thing you asked it to, run the tag command with -s, etc | 17:04 |
jeblair | mordred: a pbr function that implements 'python setup.py release' ? | 17:18 |
mordred | jeblair: yah. something that like | 17:19 |
mordred | something like that | 17:20 |
mordred | although I was considering making it two commands or splittable - so you could do the local tagging separate from pushing the local tag | 17:20 |
jeblair | mordred: i think that's a good idea | 17:21 |
mordred | (I usually do the tag and then do an sdist to check that it worked and stuff) | 17:21 |
*** hemnafk is now known as hemna | 17:22 | |
*** osanchez has quit IRC | 17:27 | |
sdague | mordred: woot | 17:28 |
sdake | jeblair re our conversation at cloudopen regarding using heat to run the gate jobs, is zuul the software that does all that? | 17:31 |
*** Ryan_Lane has joined #openstack-infra | 17:36 | |
fungi | sdake: zuul coordinates and acts as a scheduler, while jenkins handles the execution and artifact collection | 17:36 |
fungi | at least presently | 17:36 |
sdake | does jenkins execute some scripts to do the building of the vms? | 17:36 |
fungi | sdake: nodepool (and in some cases humans) do that part | 17:37 |
sdague | mordred: so curiously cookiecutter seems to trim newlines at the end of files, so it un pep8's our template | 17:37 |
*** Ryan_Lane has quit IRC | 17:38 | |
fungi | sdague: specifically nodepool has some pool management heuristics including a semi-predictive evaluation of current demand and uses that to try and maintain sufficient levels of available virtual machines | 17:38 |
*** arosen1 has joined #openstack-infra | 17:39 | |
fungi | as well as garbage-collecting the machines once they've been used | 17:39 |
*** thomasbiege1 has joined #openstack-infra | 17:39 | |
sdake | fungi does it use bare metal nodes as the backend, or openstack instances? | 17:40 |
*** arosen has quit IRC | 17:40 | |
mordred | sdake: that is correct. I have submitted a PR to upstream to fix it | 17:40 |
fungi | sdake: it presently uses openstack/nova-based service providers who donate resources to us | 17:40 |
fungi | sdake: though there is work underway to start testing tripleo on bare metal i think? | 17:41 |
*** melwitt has joined #openstack-infra | 17:41 | |
fungi | and have nodepool coordinate to a nova-bm/ironic environment the tripleo peeps are maintaining | 17:42 |
*** melwitt has quit IRC | 17:42 | |
fungi | though i've not been paying as close attention to that as i should, so i'm light on details there. lots of other stuff going on | 17:42 |
sdake | fungi which part of nodepool does the orchestration of the vm? | 17:42 |
*** reed has joined #openstack-infra | 17:42 | |
*** thomasbiege has quit IRC | 17:43 | |
fungi | sdake: it has an image builder which calls into the template vm and runs some shell scripts and puppet to get it into a desired state, then shuts it down and uses it to clone others | 17:43 |
hogepodge | pebelanger clarkb Do you know of anyone with free cycles to finish the review of https://review.openstack.org/#/c/49020/ ? | 17:43 |
*** SergeyLukjanov has joined #openstack-infra | 17:43 | |
fungi | sdake: and refreshes that daily | 17:43 |
*** melwitt has joined #openstack-infra | 17:44 | |
clarkb | hogepodge: fungi maybe? | 17:44 |
fungi | if by orchestration you mean setup, and not the running of the tests/jobs on a particular vm | 17:44 |
sdake | so when zuul says "hey I have another job for you" how does that get launched? | 17:44 |
fungi | sdake: zuul tells jenkins to run it and on which vm | 17:44 |
*** Ryan_Lane has joined #openstack-infra | 17:44 | |
sdague | clarkb: ok, first unit tests into the htmlifier to shore up it's behavior, and I already found a bug :) | 17:44 |
sdague | yay, tests | 17:45 |
clarkb | woot | 17:45 |
sdake | fungi so jenkins logs into the box and does some ssh commands or something? | 17:45 |
fungi | sdake: zuul knows a list of the jobs and under what circumstances they should be run and which systems can run them, and jenkins has details on what each actual job does | 17:45 |
*** boris-42 has joined #openstack-infra | 17:45 | |
fungi | sdake: a jenkins master can use multiple means of controlling its slaves, but we rely on ssh | 17:46 |
fungi | sdake: jenkins also has a java-based agent which runs on each slave and communicates state with the master | 17:46 |
sdake | cool let my brain cook on that for awhile | 17:46 |
sdake | thanks for the info fungi | 17:46 |
fungi | sdake: you're welcome. these are also covered with pretty diagrams and examples in a couple of brief slide presentations published at http://docs.openstack.org/infra/publications/ | 17:47 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Add a thread dump signal handler https://review.openstack.org/51248 | 17:48 |
sdake_ | thanks bookmarked for later | 17:48 |
*** jerryz has joined #openstack-infra | 17:49 | |
*** rickerc has quit IRC | 17:49 | |
*** thomasbiege1 has quit IRC | 17:50 | |
*** dkehn_ is now known as dkehn | 17:51 | |
*** amotoki has quit IRC | 17:51 | |
*** alcabrera has quit IRC | 17:52 | |
*** thomasbiege has joined #openstack-infra | 17:53 | |
fungi | hogepodge: left a comment on it. i think you got some of your cleanup backwards in the new patchset | 17:53 |
hogepodge | fungi: I think I did too. | 17:54 |
hogepodge | fungi: :-) | 17:54 |
hogepodge | fungi: This is why I love gerrit | 17:54 |
hogepodge | fungi: Thanks. | 17:54 |
*** thomasbiege has quit IRC | 17:55 | |
fungi | my pleasure | 17:55 |
*** rickerc has joined #openstack-infra | 17:55 | |
*** odyssey4me has quit IRC | 17:56 | |
jerryz | fungi: could you tell me which dns server is used for devstack gate slaves? | 17:58 |
*** moted has quit IRC | 17:58 | |
*** nati_ueno has joined #openstack-infra | 17:59 | |
jerryz | fungi: sometimes i hit this bug/question https://bugs.launchpad.net/devstack/+bug/1190844 | 17:59 |
uvirtbot | Launchpad bug 1190844 in devstack "./stack.sh is resulting any error "/opt/stack/devstack/functions: line 1228: : No such file or directory" on stable/grizzly branch" [Undecided,Invalid] | 17:59 |
fungi | jerryz: depends on the provider i think, but i'll check | 17:59 |
reed | hello folks | 18:00 |
fungi | hello reed | 18:00 |
*** johnthetubaguy has quit IRC | 18:00 | |
fungi | jerryz: it may even vary by region/availability zone... in rackspace dfw we use 72.3.128.240 and 72.3.128.241 | 18:01 |
jerryz | fungi: thanks. i think the dns i use which is 8.8.8.8 give me the wrong IP for cdn.download.cirros-cloud.net | 18:01 |
sdague | jerryz: the cdn for cirros got flakey some time yesterday | 18:02 |
fungi | jerryz: ahh, yes there were some cdn issues for cirros image downloads which got worked through yesterday. are you still encountering it in current runs? | 18:02 |
jerryz | fungi: for now, i just put the right ip in /etc/hosts | 18:03 |
*** gyee has joined #openstack-infra | 18:03 | |
jerryz | fungi: the cdn chosen in seatle,WA works for me | 18:04 |
fungi | okay, cool | 18:04 |
*** dkehn has quit IRC | 18:04 | |
*** alcabrera has joined #openstack-infra | 18:09 | |
jerryz | fungi sdague: the slaves from hp or rackspace for jenkins.o.o also has that problem? i had thought the dns i used was not smart enough to refresh available cdn ip addresses. | 18:10 |
*** pycabrera has joined #openstack-infra | 18:13 | |
*** zehicle_at_dell has joined #openstack-infra | 18:15 | |
*** esker has joined #openstack-infra | 18:15 | |
*** alcabrera has quit IRC | 18:16 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Rename ASRT -> AGT https://review.openstack.org/51267 | 18:20 |
jeblair | clarkb: ^ to try to make the debug log from nodepool more clear | 18:20 |
*** dizquierdo has left #openstack-infra | 18:22 | |
notmyname | jeblair: I see https://github.com/openstack/swift-bench exists now. does that mean we're good to go? well, after I commit a .gitreview doc | 18:26 |
jeblair | notmyname: yes, i think it merged last night sometime | 18:26 |
jeblair | notmyname: hold on that... | 18:27 |
notmyname | [gerrit] | 18:27 |
notmyname | host=review.openstack.org | 18:27 |
notmyname | port=29418 | 18:27 |
notmyname | project=openstack/swift-bench.git | 18:27 |
notmyname | jeblair: proposed .gitreview ^^ | 18:27 |
*** CaptTofu has quit IRC | 18:27 | |
jeblair | fungi, clarkb, mordred: http://git.openstack.org/cgit/openstack/swift-bench/tree/ | 18:27 |
jeblair | looks empty | 18:27 |
*** CaptTofu has joined #openstack-infra | 18:28 | |
notmyname | jeblair: empty? I see stuff | 18:28 |
jeblair | notmyname: the joys of load balancing; it's empty on git03.o.o | 18:29 |
notmyname | jeblair: so should I push a change or not? | 18:30 |
notmyname | jeblair: assuming that proposed .gitreview is good | 18:31 |
jeblair | notmyname: i think you're good. since you don't have any jobs yet, nothing automated is going to try to hit git.o.o. i'll fix git03 shortly. | 18:32 |
*** CaptTofu has quit IRC | 18:32 | |
notmyname | jeblair: great | 18:32 |
jeblair | notmyname: (earlier i was worried it was a sign something more serious broke) | 18:32 |
*** mestery has joined #openstack-infra | 18:32 | |
notmyname | jeblair: https://review.openstack.org/#/c/51268/ | 18:32 |
notmyname | jeblair: if you can give me your +1 there, I'll merge it and we should be off to the races | 18:33 |
jeblair | notmyname: done | 18:34 |
*** melwitt1 has joined #openstack-infra | 18:34 | |
notmyname | jeblair: thanks for your help | 18:35 |
*** dafter has quit IRC | 18:35 | |
jeblair | notmyname: no prob! | 18:35 |
*** itchsn has joined #openstack-infra | 18:36 | |
*** melwitt has quit IRC | 18:37 | |
*** dcramer_ has quit IRC | 18:37 | |
*** sarob has quit IRC | 18:38 | |
*** itchsn has quit IRC | 18:38 | |
*** CaptTofu has joined #openstack-infra | 18:38 | |
*** dkehn has joined #openstack-infra | 18:39 | |
jeblair | clarkb, mordred, fungi: ok, the swift-bench thing on git03 was just the replication race condition; i replicated again and it's updated. i'm looking forward to having salt do this. :) | 18:41 |
jeblair | notmyname: ^ all the git.o.o servers have swift-bench now | 18:42 |
notmyname | yay | 18:42 |
*** dafter has joined #openstack-infra | 18:44 | |
*** alexpilotti_ has joined #openstack-infra | 18:45 | |
ttx | jeblair: hey, nice work on unbreaking the gate! What caused the initial fail ? | 18:47 |
*** alexpilotti has quit IRC | 18:47 | |
*** alexpilotti_ is now known as alexpilotti | 18:47 | |
ttx | (if we know that) | 18:48 |
*** dcramer_ has joined #openstack-infra | 18:53 | |
mordred | jeblair: ++ salt | 18:55 |
*** Bada has joined #openstack-infra | 19:00 | |
dkranz | There seems to be a problem with https://review.openstack.org/#/c/50795/ merging | 19:05 |
dkranz | jenkins reported success an hour ago but zuul shows some of the jobs as "queued". That is strange. | 19:06 |
*** dcramer_ has quit IRC | 19:06 | |
*** melwitt has joined #openstack-infra | 19:06 | |
*** melwitt1 has quit IRC | 19:06 | |
clarkb | dkranz: the +1 verified is for your recheck. still waiting on gate tests | 19:09 |
dkranz | clarkb: ok, thanks. Guess things are really slow | 19:09 |
hub_cap | mordred: whats the status on the work we talked about in seattle? the images stuff.. im at a point where i can take any/all of it on | 19:16 |
*** dhouck_ has quit IRC | 19:16 | |
*** jog0 is now known as flashgordon | 19:17 | |
hub_cap | clarkb: ^ ^ | 19:17 |
hub_cap | flashgordon: silly handle friday? | 19:17 |
flashgordon | hub_cap: casual nick friday | 19:18 |
flashgordon | most of the nova folk do it | 19:19 |
hub_cap | oh yes im aware :) | 19:19 |
*** dcramer_ has joined #openstack-infra | 19:20 | |
* fungi thinks every day is casual nick friday (and hawaiian shirt tuesday) | 19:20 | |
mordred | hub_cap: it's - uhm. | 19:20 |
hub_cap | i thought so :) | 19:20 |
mordred | we need to add a thing to the d-g caching scripts to download the images and cache them | 19:21 |
mordred | then you're good to go | 19:21 |
*** alexpilotti has quit IRC | 19:21 | |
hub_cap | like i said, i can help w any of it :) is someone working on the d-g caching script stuff? | 19:21 |
mordred | nope | 19:24 |
*** dprince has quit IRC | 19:24 | |
hub_cap | mind if i take a stab @it? | 19:25 |
mordred | hub_cap: please do! https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/scripts/devstack-cache.py | 19:25 |
mordred | hub_cap: is the script you want to look at | 19:25 |
hub_cap | <3 | 19:25 |
mordred | it currently has a place where it pre-downloads images referenced by devstack | 19:25 |
hub_cap | cool ill peep it and ask questions :) | 19:26 |
mordred | hub_cap: steps forward would be either just add direct curl commands to download the images | 19:26 |
mordred | hub_cap: OR - you could get fancy and read image elemens | 19:26 |
mordred | elements | 19:26 |
mordred | in https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/scripts/prepare_devstack.sh | 19:26 |
mordred | you'll see where we pre-clone a bunch of git repos | 19:26 |
mordred | you could add needed repos there, and then read files in them to find out what images they want to download | 19:27 |
mordred | depends on how clever you want to be | 19:27 |
hub_cap | ya i was wondering about that.. are we wanting to test every flavor of image elements? or just the "supported" ones from openstack perspective | 19:27 |
hub_cap | for trove, i dont think i need to test fedora/centos, i can test centos and call it a day. but thats my perspective.. | 19:28 |
hub_cap | do we have a list of official supported linux flavors? | 19:29 |
mordred | from a d-g perspective | 19:29 |
mordred | we want to precache things that jobs that run on the nodes might want to download | 19:29 |
mordred | (this is why we go through and pre-download all of the debs that devstack _might_ wind up installing, but not install them) | 19:29 |
mordred | but "might" | 19:29 |
*** mriedem has quit IRC | 19:30 | |
mordred | is as defined by the set of things actually referenced in elements in repos that we might actually run | 19:30 |
mordred | clarkb, fungi, jeblair: did you see [openstack-dev] "Thanks for fixing my patch" ? | 19:30 |
*** pycabrera is now known as alcabrera | 19:31 | |
mordred | seems like a policy amendment that might apply nicely for us too | 19:31 |
*** davidhadas has joined #openstack-infra | 19:31 | |
hub_cap | mordred: that makes sense but if someone busts out a scientific linux element, do we want to download/cache that? | 19:32 |
hub_cap | oh and imma use the image elements, just fyi, cuz they have a nice little set of image url details i dont care to duplicate | 19:33 |
clarkb | mordred I did | 19:33 |
clarkb | mordred I think we basically do that already but only when in a time crunch | 19:34 |
hub_cap | currently there are fedora/centos/ubuntu, i can cache all 3 if we think we _may_ need to test on them all | 19:34 |
clarkb | we could shift to being proactive about it | 19:34 |
fungi | mordred: in fact, i do try to do that when i'm in a situation to do so (availability and knowledge-wise) | 19:36 |
fungi | i think it's a great idea | 19:36 |
fungi | i assumed it was already an accepted workflow among our team | 19:36 |
ttx | fungi: did you guys get to the bottom of today's issue, root cause ? | 19:36 |
fungi | ttx: no, we narrowed it down but there were insufficient debugging capabilities available, so jeblair has added those to the daemon for "next time" | 19:37 |
ttx | fungi: ack | 19:38 |
ttx | aqt least whatever it was that caused it, it's gone now | 19:38 |
fungi | well, whatever caused it got it into a perpetual state which was cleared through a restart, but next time we can generate a thread dump and restart it right away, then have the luxury of debugging while things don't remain indefinitely unusable | 19:39 |
ttx | cinder rc2 on its way, hold to your seats | 19:44 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Change test_queries from logical AND to OR https://review.openstack.org/50160 | 19:46 |
*** arosen1 has quit IRC | 19:47 | |
*** arosen has joined #openstack-infra | 19:48 | |
clarkb | mordred: do you have any more ideas on openstack_citest mysql perms? I think granting create and drop globally is necessary | 19:53 |
mordred | clarkb: I believe you are correct | 19:59 |
*** zehicle_at_dell has quit IRC | 19:59 | |
mordred | clarkb: otherwise, we could use mysql sandbox to spin up per-testrun mysqls and tear them down afterwards... | 19:59 |
* mordred hides | 20:00 | |
flashgordon | if anyone is looking for reviews to do, hacking has some reviews that need some attention https://review.openstack.org/#/q/status:open+project:openstack-dev/hacking,n,z | 20:01 |
fungi | mordred: isn't that basically what ceilo's mongodb functional tests use? | 20:02 |
jeblair | fungi, mordred, clarkb: yeah, i believe that has been accepted around infra repos, and i would expect it to be considered on-form in openstack repos too | 20:02 |
jeblair | fungi, mordred, clarkb: one good reason not to do that for many patches in infra is to help people learn about our systems -- | 20:03 |
flashgordon | clarkb: btw you are the most active reviewer in all of openstack | 20:03 |
flashgordon | http://russellbryant.net/openstack-stats/all-reviewers-180.txt | 20:03 |
jeblair | a lot of folks come in and say "i want to try to figure this out", and you know, teaching to fish and all. | 20:03 |
fungi | agreed. best reserved for urgent issues. it's not like we have tons of time to spare fixing up non-urgent changes | 20:05 |
ttx | cinder rc2 out | 20:05 |
* ttx calls it a day | 20:05 | |
jeblair | clarkb: congrats! | 20:05 |
mordred | clarkb: w00t! | 20:06 |
mordred | wow. I'm 14th | 20:06 |
clarkb | flashgordon: I saw that and was a little surprised. some of that is from mass rechecks though | 20:06 |
mordred | clarkb: ssh | 20:06 |
clarkb | :) | 20:06 |
jeblair | clarkb: you leave votes with rechecks? | 20:06 |
clarkb | jeblair no | 20:06 |
mordred | clarkb: actually, that's only tracking votes | 20:07 |
jeblair | clarkb: then... no? :) | 20:07 |
clarkb | is thar only votes o_O | 20:07 |
clarkb | wow | 20:07 |
flashgordon | clarkb: most are in infra it looks like | 20:07 |
lifeless | wow, I'm up there | 20:07 |
clarkb | my review queue is huge. I try to stab at it as often as possible | 20:07 |
lifeless | and wth is dripton ? | 20:07 |
lifeless | http://russellbryant.net/openstack-stats/all-reviewers-30.txt | 20:08 |
lifeless | 11th, yay. | 20:08 |
hub_cap | thats it, im +1'ing random shit for the next 30 days | 20:08 |
lifeless | yeah, no. | 20:09 |
hub_cap | hahaha | 20:09 |
hub_cap | lifeless: great work, +1.. everything | 20:09 |
jeblair | hub_cap: that'll show up in the +/- column | 20:09 |
hub_cap | i know ill be a +1 baller jerryz | 20:09 |
hub_cap | *jeblair | 20:09 |
hub_cap | tab-fail | 20:09 |
sdague | hub_cap: yeh, that's why the % pos and conflict columns are there to try to sanity check things | 20:10 |
sdague | if you see 90+% pos, the person has missed the point of reviewing | 20:10 |
jeblair | for quite some time, clarkb and i have both maintained an 80% average | 20:10 |
hub_cap | 66%.. thats at least 20% worth of system-gaming i can do... numbers make u look good right?? | 20:11 |
hub_cap | ;) | 20:11 |
flashgordon | sdague: looks like i am missing the point http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt | 20:12 |
mordred | flashgordon: me too | 20:12 |
jeblair | hub_cap: yeah, i think you could be 20% nicer, but you wouldn't be you. | 20:12 |
flashgordon | lifeless: ^^ | 20:12 |
hub_cap | jeblair: TRU | 20:12 |
mordred | although part of my problem is that when I'm -1 I tend to poke someone in IRC to ask/chat about it | 20:12 |
mordred | I really need to leave that in the system more | 20:13 |
hub_cap | mordred: i stopped doing that | 20:13 |
hub_cap | i was super low on tracked reviews | 20:13 |
jeblair | mordred: i do both | 20:13 |
mordred | jeblair: I need to do both | 20:13 |
hub_cap | i looked like a schmuck (well more of one than normal) | 20:13 |
hub_cap | yeah ptl of trove has 2 reviews in the past 30 days | 20:13 |
mordred | jeblair: can we set up a bot that will let me comment on gerrit thigns? | 20:13 |
mordred | jeblair: so I can say #bot -1 13415 I have issues with this | 20:13 |
mordred | ? | 20:13 |
hub_cap | emacs has a command for that mordred | 20:14 |
mordred | hub_cap: good point | 20:14 |
jeblair | mordred: yeah, what could go wrong with giving an irc bot super-super-admin access in gerrit? | 20:14 |
lifeless | flashgordon: ? | 20:14 |
mordred | jeblair: can't see anything wrong with that | 20:14 |
fungi | how did i get to #6? i feel perpetually behind on reviews :/ | 20:15 |
jeblair | mordred: you're now at 92% positive reviews! ;) | 20:15 |
mordred | flashgordon: I do find it interesting that my 30 day percentage is about == to my 180 day | 20:15 |
mordred | jeblair: I am? | 20:15 |
*** adalbas has quit IRC | 20:15 | |
mordred | oh - just in infra? | 20:15 |
fungi | ugh, i'm also by far the most "positive" reviewer in the top 10 besides mordred | 20:15 |
jeblair | mordred: sorry, EJOKE | 20:15 |
flashgordon | lifeless: my review stats are not good http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt | 20:15 |
lifeless | flashgordon: you are fairly positive | 20:15 |
lifeless | flashgordon: OTOH you've been reviewing code mainly written by experienced core folk | 20:16 |
jeblair | mordred: er, the idea was that you gave a cursory +1 to the idea of giving an irc bot super-super-admin access to gerrit. | 20:16 |
flashgordon | lifeless: I have been doing a a lot of -0s and not -1 | 20:16 |
fungi | mordred: we need to be more curmudgeoney apparently | 20:16 |
flashgordon | lifeless: yeah | 20:16 |
lifeless | flashgordon: I don't think that counts against you; in fact I'd kindof like to see a partitioned metric | 20:16 |
lifeless | reviews vs core | 20:16 |
lifeless | reviews vs noncore | 20:17 |
hub_cap | fungi: maybe yall just do such good work that theres nothing to -1 and this system doesnt accurately track that | 20:17 |
lifeless | I suspect it would be interesting | 20:17 |
lifeless | rustlebee: ^ But I have no plans to implement just yet :P | 20:17 |
flashgordon | lifeless: it would be interesting | 20:17 |
rustlebee | track all the things | 20:18 |
* mordred needs to go back to reviewing first thing in the morning | 20:19 | |
mordred | and clearing the entire outstanding queue down | 20:19 |
clarkb | mordred: I can't do that because by 9:30 am PST all the fun stuff is happening | 20:20 |
clarkb | I have found post dinner to be good for reviews | 20:20 |
mordred | clarkb: wake up at 6:30am PST like fungi and I! | 20:21 |
fungi | i also think one of the things which keeps my review average on the positive side, besides addressing concerns via irc only (which i should definitely stop doing) is not leaving a negative vote if someone else already has unless it's for a different issue, even if i reviewed the current state of the patch | 20:21 |
mordred | yah | 20:21 |
fungi | i should probably just get in the habit of it, and not worry so much about people potentially getting offended by negative score dogpiling | 20:22 |
lifeless | fungi: I usually do what you describe there | 20:23 |
mordred | clarkb: btw - you're welcome for the email I just sent ;) | 20:23 |
lifeless | fungi: often if a patch has a -1 already, I won't even review it | 20:23 |
lifeless | other than a cursory check to see if the submitter replied saying 'no, I disagree' | 20:23 |
mordred | for the folks here who are not HP employees (shocking) I just sent an email to the internal openstack interest mailing list with the subject "Clark Boylan is the most active reviewer in all of OpenStack for Icehouse" | 20:23 |
mordred | lifeless: I actually have a search filter that keeps me from seeing things with a -1 | 20:24 |
mordred | but largely that's because if jeblair or clarkb or fungi have -1'd something, it's pretty darned solid | 20:25 |
*** yolanda has quit IRC | 20:25 | |
*** sandywalsh has quit IRC | 20:26 | |
fungi | especially if i -1'd it... must have been written in go or something | 20:27 |
*** weshay has quit IRC | 20:27 | |
mordred | rustlebee: wow. your -2 count is so high! | 20:27 |
rustlebee | feature freeze did it probably | 20:28 |
sdague | mordred: feature freeze does that | 20:28 |
mordred | ah | 20:28 |
rustlebee | but i do tend to have a higher -2 count than most anyway :) | 20:28 |
jeblair | so it looks like today is exercising the nodepool burst code | 20:28 |
rustlebee | i love saying "NO!" | 20:28 |
mordred | neat | 20:28 |
sdague | you would not believe the crazy that people push after FF :) lots of people don't pay attention to the calendar | 20:28 |
mordred | we should have a "Block" button which isn't tied to code review | 20:29 |
fungi | or to the mailing list or to irc or to other people's comments on their other reviews or | 20:29 |
jeblair | if you look at the nodepool graph, the top of the green line isn't flat anymore; i think whenever that's the case, and the green line is above its normal level, it's bursting due to demand from gearman | 20:29 |
jeblair | (if it's not flat and it's below the normal level, we're hitting max capacity) | 20:30 |
mordred | jeblair: oh neat! | 20:30 |
lifeless | oh btw infra people | 20:30 |
lifeless | tripleo now has an externally accessible trunk deployed kvm cloud | 20:31 |
lifeless | updates every 40m +- | 20:31 |
mordred | lifeless: when you say updates - you mean goes away and comes back, yeah? | 20:31 |
lifeless | erm, trunk of OpenStack's API services etc, for clarity (it's not /just/ trunk of tripleo's code:) | 20:32 |
lifeless | mordred: yes, making it preserve vm's is the next MVP | 20:32 |
mordred | neat! | 20:32 |
lifeless | mordred: and after that having it not interrupt shit | 20:32 |
lifeless | right now hiera has credentials for infra on the grizzly kvm cloud | 20:32 |
lifeless | which should be very reliable as it's entirely static | 20:32 |
jeblair | lifeless: i'm excited about all of that | 20:33 |
lifeless | this is just a headsup on where the next thing is at | 20:33 |
* mordred can't wait until we add some nodepool load to your CD cloud so we can watch you update under piles of load | 20:33 | |
jeblair | ++ | 20:33 |
lifeless | yay :) | 20:34 |
lifeless | jeblair: I believe there is a nodepool bug preventing the tripleo experimental job being enabled? Can we help with that? | 20:34 |
jeblair | lifeless: i think we have the nodepool changes in place to improve our chances of using the grizzly cloud without blowing everything up. | 20:34 |
jeblair | lifeless: i'm not sure all of them are in the running nodepool yet | 20:35 |
jeblair | lifeless: but perhaps this weekend we can restart nodepool and put that in again | 20:35 |
jeblair | since we had a fire this morning, i want to take it easy for a while to give things a chance to catch up and hopefully minimize impact to the release process | 20:36 |
mordred | ++ | 20:36 |
*** prad_ has quit IRC | 20:36 | |
lifeless | jeblair: ack, thanks | 20:37 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Rename ASRT -> AGT https://review.openstack.org/51267 | 20:37 |
sdague | clarkb: you about? | 20:37 |
sdague | I wanted to get your take on the os_loganalyze tree to figure out what more I should do before we start connecting it up to the log server. Current code won't really change anything, but at least now I have a framework of test in place so I can figure out if I break something. | 20:40 |
*** Bada has quit IRC | 20:41 | |
sdague | so I can feel confident in doing the keystone and swift log support | 20:41 |
openstackgerrit | A change was merged to openstack-infra/gitdm: add user to openstack-config https://review.openstack.org/50425 | 20:41 |
mordred | wow. that's such a good commit message | 20:41 |
*** dafter has quit IRC | 20:41 | |
mordred | the other reason my review % is so high is that I keep reviewing jeblair code. | 20:42 |
jeblair | mordred: nice, now i can't say anything bad about your 90% average. :) | 20:42 |
boris-42 | jeblair hi | 20:43 |
jeblair | boris-42: hello | 20:43 |
clarkb | sdague: ish. train wifi/verizon not so good | 20:43 |
boris-42 | jeblair how are you? | 20:43 |
mordred | jeblair: :) | 20:44 |
jeblair | boris-42: i am well. how are you? | 20:44 |
*** briancline has quit IRC | 20:44 | |
clarkb | sdague: I think starting with a 1:1 move is good then we can tack on bug fixes | 20:44 |
mordred | jeblair: any reason I should not APRV a nodepool change? I kinda feel like you should handle landing those at the moment - am I being overly cautious? | 20:44 |
*** tvb|afk has joined #openstack-infra | 20:44 | |
boris-42 | jeblair nice thanks. I would like to add benchmarking & profiling tool to OpenStack CI =) so probably you will be interested | 20:45 |
*** ruhe has joined #openstack-infra | 20:45 | |
jeblair | mordred: nope -- as long as it doesn't require a coordinated config file change, should be safe. nodepool doesn't auto-restart, so it doesn't take effect until we restart it manually for some reason. | 20:45 |
*** ruhe has quit IRC | 20:46 | |
jeblair | boris-42: yes, very much! do you think it would be a good idea to send an email to openstack-infra@lists.openstack.org to tell us a bit about the tool? | 20:46 |
*** thomasm has quit IRC | 20:46 | |
boris-42 | jeblair could we move to #openstack-rally | 20:46 |
*** alcabrera has quit IRC | 20:47 | |
sdague | boris-42: I think it would be better here | 20:47 |
sdague | having a million subchannels doesn't help keep folks on board | 20:48 |
boris-42 | sdague jeblair it's separated project … but ok | 20:48 |
boris-42 | sdague jeblair here is the wiki https://wiki.openstack.org/wiki/Rally | 20:48 |
markmcclain | so looks like we hit the time limit on py26 neutron tests... | 20:48 |
markmcclain | http://logs.openstack.org/08/50608/2/gate/gate-neutron-python26/de9ae8c/console.html | 20:48 |
boris-42 | sdague jeblair actually the official announce will at this Monday.. | 20:48 |
markmcclain | it actually succeed, but the gate failed since it ran over the hour | 20:49 |
jeblair | markmcclain: do neutron unit tests really take twice as long as a full tempest run? | 20:49 |
sdague | markmcclain: yeh, an hour is pretty long | 20:49 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Add a thread dump signal handler https://review.openstack.org/51248 | 20:49 |
markmcclain | I'm surprised by the runtime a bit | 20:50 |
sdague | it's seemingly not doing anything for the first 15 minutes | 20:50 |
sdague | figuring out why, would be helpful | 20:50 |
sdague | jeblair: they did pass 40 mins on py26 during rc phase | 20:50 |
clarkb | is git being slow again? | 20:50 |
sdague | so if there was some new 15 min delay, I could see that smashing into 60 | 20:50 |
sdague | clarkb: I don't know | 20:51 |
sdague | 2013-10-11 18:44:23.275 | Building remotely on centos6-6 in workspace /home/jenkins/workspace/gate-neutron-python26 | 20:51 |
sdague | 2013-10-11 19:02:42.590 | [gate-neutron-python26] $ /bin/bash -xe /tmp/hudson3680450177999363028.sh | 20:51 |
clarkb | git seems fine. that delay at the beginning is weird though | 20:52 |
jeblair | clarkb: hrm, the 15 min delay looks like it's from jenkins | 20:52 |
clarkb | jeblair: ya | 20:52 |
*** melwitt has quit IRC | 20:53 | |
hub_cap | so given that we want to cache the images for dib in the d-g jobs, its probably safe to assume we should run the entire 10-* script that does the work, ya? example: https://github.com/openstack/diskimage-builder/blob/master/elements/ubuntu/root.d/10-cache-ubuntu-tarball | 20:54 |
jeblair | boris-42: ok, how can we help you? | 20:54 |
boris-42 | jeblair Rally is able to deploy cloud and test it, or just test it=) | 20:54 |
hub_cap | otherwise if we put dib --offline, we will only have done 1/10'th of the work to make the image dib usable | 20:55 |
jeblair | boris-42: which do you want to do first? | 20:55 |
boris-42 | jeblair to test it it requires only endpoints of cloud | 20:55 |
hub_cap | what say you to that lifeless? (see my last 2 msgs) | 20:55 |
boris-42 | jeblair at the end I would like to deploy and test | 20:55 |
jeblair | clarkb: the current job running on centos6-6 did not have a delay | 20:55 |
clarkb | jeblair: could a job have taken over the node before eg bug in gearman plugins locking? | 20:56 |
boris-42 | jeblair Rally will support different deploy engines. (at moment only DevStack) but in future trippleO and fule | 20:56 |
jeblair | boris-42: so we have added some hooks to the devstack-gate script that let you use a lot of the functionality in it | 20:56 |
* fungi is popping out for an early dinner, but will return soon | 20:56 | |
openstackgerrit | A change was merged to openstack-infra/config: Fix sqlalchemy-migrate py26/sa07 job https://review.openstack.org/44686 | 20:56 |
clarkb | so 15 minutes of some other job running? | 20:57 |
openstackgerrit | A change was merged to openstack-infra/config: Add tagging permissions to python-libraclient https://review.openstack.org/45294 | 20:57 |
boris-42 | jeblair to test existing cloud I should have only Rally & cloud enpoints | 20:57 |
jeblair | boris-42: so you should be able to write a job that runs rally on a cloud set up by devstack | 20:57 |
jeblair | boris-42: or you can write a job like devstack-gate that uses rally to set up a cloud instead of devstack | 20:57 |
*** senk has joined #openstack-infra | 20:58 | |
boris-42 | jeblair interesting, ok I think it will be simpler for start to write just job that will run tests against your already deployed devstack cloud | 20:58 |
sdague | anyone up for helping me get this tree into gerrit? | 20:59 |
*** julim has quit IRC | 20:59 | |
jeblair | boris-42: ok. you can look at the swift-devstack-vm-functional jobs for an example of how to do something like that | 21:00 |
boris-42 | jeblair thank you! | 21:01 |
boris-42 | jeblair will try on next week=) | 21:01 |
clarkb | sdague: I can try. reading ci.openstack.org/stackforge.html is a good place to start | 21:01 |
*** melwitt has joined #openstack-infra | 21:01 | |
jeblair | clarkb: https://jenkins02.openstack.org/job/gate-nova-python26/6747/console https://jenkins02.openstack.org/job/gate-neutron-python26/2257/console https://jenkins02.openstack.org/job/gate-horizon-python26/1379/console | 21:02 |
jeblair | clarkb: that's the job before, the neutron job, and the job after | 21:02 |
jeblair | timestamps don't seem to overlap | 21:02 |
jeblair | clarkb: and neither the job before or after did that | 21:02 |
jeblair | i'm leaning toward 'jenkins got busy' or 'jenkins got semi-deadlocked' or 'jenkins garbage collected' or, well, in general, just blaming jenkins for being jenkins. | 21:03 |
clarkb | jeblair: this is weird. ya jenkins for being jenkins seems plausible | 21:03 |
*** jerryz has quit IRC | 21:04 | |
jeblair | markmcclain, sdague: so it looks like 15 min of that runtime is jenkins derping. let's call that a fluke for the moment, unless it happens with significant regularity. | 21:04 |
sdague | jeblair: sounds fair | 21:04 |
sdague | clarkb: ok, I'm assuming this will live in openstack-infra/ and will propose a patch accordingly | 21:05 |
jeblair | sdague, clarkb: ++ | 21:05 |
*** senk has quit IRC | 21:06 | |
clarkb | sdague yup. the stackforge page is a decent template for what you need though | 21:07 |
*** miqui has quit IRC | 21:08 | |
*** matty_dubs is now known as matty_dubs|gone | 21:09 | |
sdague | what's included in python-jobs? | 21:10 |
clarkb | pep8 pythonXX and pypy | 21:10 |
clarkb | also gate-*-docs | 21:10 |
clarkb | and coverage | 21:10 |
*** lcestari has quit IRC | 21:14 | |
*** sarob has joined #openstack-infra | 21:14 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul https://review.openstack.org/51299 | 21:15 |
sdague | so, that about right? | 21:15 |
*** CaptTofu has quit IRC | 21:17 | |
*** CaptTofu has joined #openstack-infra | 21:17 | |
clarkb | sdague the pep8 and python jobs are just gate-* no check-* | 21:17 |
sdague | ok | 21:18 |
sdague | let me fix that quick | 21:18 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul https://review.openstack.org/51299 | 21:18 |
*** anteaya has joined #openstack-infra | 21:19 | |
anteaya | clarkb: I am meeting all sorts of elastic search people | 21:19 |
*** SergeyLukjanov has quit IRC | 21:19 | |
dkranz | clarkb: My tempest job watcher thinks only four tempest gate jobs have finished in the past few hours. Is it wrong or did I just pick a bad time to start with this? | 21:19 |
anteaya | do you have a list of bugs or an etherpad that outlines your current pain points with logstash and elastic search so I can read up and ask intelligent questions | 21:19 |
*** SergeyLukjanov has joined #openstack-infra | 21:20 | |
anteaya | and maybe find out something useful for you? | 21:20 |
clarkb | anteaya: I don't they are fairly nebulous around scaling | 21:20 |
clarkb | sdague lgtm | 21:20 |
anteaya | clarkb: yeah that is what I understood | 21:20 |
clarkb | anteaya I need to upgrade to latest next week | 21:20 |
clarkb | newer versions are supposed to be better | 21:20 |
anteaya | do you think that will address some of the current scaling issues? | 21:20 |
anteaya | k | 21:21 |
anteaya | I'll ask about versions tomorrow | 21:21 |
*** mrodden has quit IRC | 21:21 | |
anteaya | what version of logstash and elastic search are we using right now | 21:21 |
anteaya | and what do you want to go to next week? | 21:21 |
clarkb | yes es memory use is much better in 0.90.X apparently | 21:21 |
sdague | clarkb: next time you are in logstash, I have requests for 2 pieces of metadata to get added to the runs | 21:21 |
sdague | 1) cloud-az | 21:21 |
sdague | 2) branch | 21:22 |
*** CaptTofu has quit IRC | 21:22 | |
clarkb | dkranz: I don't know. currently on a poor connection. | 21:23 |
anteaya | clarkb: the one bit of info I got from my after dinner walk around Budapest companions, who just happen to have an elastic search as a service company - what luck - is that they run many small clusters rather than large clusters | 21:23 |
clarkb | sdague: noted | 21:23 |
sdague | clarkb: thanks :) | 21:24 |
dkranz | clarkb: ok, given that my patch is still hung in zuul almost 4 hours later perhaps it is just slow | 21:24 |
anteaya | I'm not sure how the size of our cluster would be characterized | 21:24 |
clarkb | anteaya: interesting I wonder how they shard across clusters | 21:24 |
anteaya | I can ask | 21:24 |
*** esker has quit IRC | 21:27 | |
sdague | clarkb: ok, jenkins did a +1 - https://review.openstack.org/#/c/51299/ | 21:28 |
*** vipul is now known as vipul-away | 21:28 | |
*** vipul-away is now known as vipul | 21:28 | |
sdague | jeblair, you got a sec to check that out as well? | 21:28 |
sdague | I'd like to get this over so I can at least call that part good before the weekend, if possible :) | 21:29 |
*** SergeyLukjanov is now known as _SergeyLukjanov | 21:32 | |
*** _SergeyLukjanov is now known as SergeyLukjanov | 21:32 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 21:33 | |
*** _SergeyLukjanov is now known as SergeyLukjanov | 21:33 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 21:33 | |
*** _SergeyLukjanov is now known as SergeyLukjanov | 21:33 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 21:34 | |
*** _SergeyLukjanov is now known as SergeyLukjanov | 21:34 | |
*** blamar has joined #openstack-infra | 21:39 | |
*** vipul is now known as vipul-away | 21:43 | |
openstackgerrit | A change was merged to openstack-dev/pbr: Do not pass unicode where byte strings are wanted https://review.openstack.org/48355 | 21:47 |
fungi | sdague: you still have teh typoz | 21:50 |
*** anteaya has quit IRC | 21:51 | |
*** vipul-away is now known as vipul | 21:54 | |
*** mgagne has quit IRC | 21:56 | |
fungi | so as far as the py26 unit test timeout, i see than jenkins02 is in the midst of one of those use-all-the-things fits and is well on its way to memory exhaustion as a result... http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=41&page=2 | 21:58 |
fungi | s/than/that/ | 21:58 |
fungi | i give it 30-60 minutes before available ram is full | 21:59 |
fungi | though looking at the swap graph, yesterday's oom condition didn't happen until it reached around 0.5g swap used and then suddenly spiked in a matter of 10-20 minutes until it was up to 2g swap | 22:01 |
clarkb | :/ is there a newer version of jenkins out. ww could try upgrading | 22:02 |
jeblair | fungi can you gracefully stop and restart it? | 22:02 |
fungi | jeblair: i definitely can | 22:02 |
fungi | was wondering if we wanted to troubleshoot further first, since we've caught it in this state | 22:02 |
fungi | i'm checking the thread count real quick | 22:02 |
*** senk has joined #openstack-infra | 22:03 | |
jeblair | i am afk and not useful | 22:03 |
*** dkranz has quit IRC | 22:03 | |
fungi | no worries--i'm collecting what details i can first | 22:03 |
fungi | but will definitely try to cycle it here in a moment and see if that helps | 22:03 |
*** gyee has quit IRC | 22:03 | |
clarkb | ++ | 22:04 |
*** pcm_ has quit IRC | 22:04 | |
clarkb | I cant help for a bit but should have proper wifi in about an hour | 22:04 |
*** SergeyLukjanov has quit IRC | 22:04 | |
fungi | thread count is highish but reasonable. not like that other time where it went batty | 22:05 |
*** SergeyLukjanov has joined #openstack-infra | 22:05 | |
fungi | Threads on jenkins02.openstack.org@166.78.48.99: Number = 1,935, Maximum = 3,390, Total started = 106,699 | 22:05 |
Steely_Spam | https://review.openstack.org/#/c/49622/ | 22:05 |
Steely_Spam | is that hanging out because it got a -1 during check? | 22:05 |
Steely_Spam | I didn't think that was a thing | 22:06 |
*** SergeyLukjanov is now known as _SergeyLukjanov | 22:06 | |
fungi | for comparison... | 22:06 |
fungi | Threads on jenkins01.openstack.org@166.78.188.99: Number = 1,422, Maximum = 19,590, Total started = 807,517 | 22:06 |
*** _SergeyLukjanov is now known as SergeyLukjanov | 22:06 | |
*** senk has quit IRC | 22:07 | |
clarkb | no it should clear the -1 and move on. that is why zuul leaves a gate jobs starting comment | 22:07 |
Steely_Spam | clarkb: okay, I thought so... | 22:08 |
fungi | clarkb: Steely_Spam: though in this case i'm not finding it on the zuul status page | 22:08 |
Steely_Spam | fungi: right, it's not in the queue for some reason | 22:09 |
fungi | it got a new patchset upload after it was approved but before it merged, then got approved again | 22:09 |
*** jerryz has joined #openstack-infra | 22:09 | |
Steely_Spam | maybe a reverify would kick it? | 22:09 |
fungi | it's possible it was re-approved while the previous patchset was still in the process of waiting to be kicked out of today's extremely slow gate | 22:09 |
fungi | Steely_Spam: so, yes, try to reverify and see if jenkins leaves a new "starting gating" comment on it after that | 22:10 |
* Steely_Spam tries | 22:10 | |
jerryz | fungi: it is still not unusual for me to run into this bug: https://bugs.launchpad.net/openstack-ci/+bug/1225664 | 22:10 |
uvirtbot | Launchpad bug 1225664 in openstack-ci "tempest.api.volume.test_volumes_actions.VolumesActionsTestXML flakey failure" [High,Triaged] | 22:10 |
Steely_Spam | fungi: related question: can I put a recheck/reverify command on the first line and more comment below it, or does the whole comment have to be just the command in order to work? | 22:10 |
fungi | jerryz: did you hit it recently? | 22:11 |
jerryz | fungi: i also see in e-r status report several reviews also fail due to that bug | 22:11 |
Steely_Spam | fungi: yes, that kicked it and like ten behind it, thanks :) | 22:11 |
fungi | Steely_Spam: no, it's a very strict match right now, no comments in the same post. i usually leave a second comment with my details | 22:11 |
Steely_Spam | fungi: okay, I've been doing the same, just wondering | 22:11 |
jerryz | fungi: my code base tested should be two or three days ago | 22:11 |
jerryz | fungi: but in e-r 's report, recent reviews also hit similar failure | 22:12 |
fungi | jerryz: it's also possible the elastic-recheck criteria for matching that issue are too vague and catching more than one problem under that umbrella | 22:12 |
fungi | jerryz: link to a recent failure or the report you're talking about? | 22:13 |
jerryz | Affecting changes: 42523, 46696, 46479, 46206, 46598, 45306, 46738, 46777, 46219, 46792, 42240 | 22:13 |
jerryz | https://review.openstack.org/#/c/42240/ | 22:13 |
*** dcramer_ has quit IRC | 22:14 | |
*** tvb|afk has quit IRC | 22:14 | |
fungi | jerryz: thanks--i'll try to take a look in a bit once i've got jenkins02 back under control | 22:14 |
fungi | heh... top reports the jvm on jenkins02 is using 40g of virtual memory. it doesn't have but 32g including swap | 22:15 |
fungi | must be shared | 22:15 |
fungi | resident is 26g though | 22:16 |
fungi | okay, jenkins02 is preparing for shutdown. i'll restart the service once all jobs complete | 22:17 |
fungi | probably about 30 minutes | 22:18 |
jeblair | fungi i think nodepool is running the new code that should shift load to jenkins01. you may want to keep an eye on jenkins01. | 22:19 |
fungi | yeah, as of this morning's restart. i was thinking about that as well | 22:20 |
jeblair | since theres a lot of untested stuff going on. | 22:20 |
jeblair | if jenkins01 gets overloaded we may need to add a cap in nodepool. | 22:21 |
fungi | jerryz: okay, i see that's the swift storage cap being exceeded? you might see if afazekas wants to work on enlarging that since he did the past couple of changes for it (or propose a similar one?) | 22:21 |
fungi | jeblair: definitely agree | 22:21 |
jgriffith | jerryz: question on that... | 22:21 |
jgriffith | jerryz: which case of it are you seeing? | 22:21 |
jeblair | fungi if there is a prob you can adjust provider max values in nodepool.yaml to quickly get a similar effect. | 22:22 |
fungi | jeblair: noted--thanks! | 22:22 |
*** SergeyLukjanov has quit IRC | 22:22 | |
jerryz | jgriffith: https://review.openstack.org/#/c/46531 and https://review.openstack.org/#/c/42240/ | 22:23 |
jerryz | those are recent failures | 22:23 |
fungi | i'll afk for a few minutes while jenkins02 finishes up and brb | 22:24 |
jgriffith | jerryz: interesting... 500 failure back from the glance client | 22:26 |
jgriffith | jerryz: http://logs.openstack.org/31/46531/6/gate/gate-tempest-devstack-vm-postgres-full/aa0cbc2/logs/screen-c-vol.txt.gz#_2013-10-10_15_52_57_753 | 22:26 |
*** thedodd has quit IRC | 22:30 | |
*** CaptTofu has joined #openstack-infra | 22:30 | |
lifeless | jeblair: I'd like to offer all TripleO ATC's accounts on this cloud; I could just mail -dev but I'm pondering whether something more directed (e.g. direct email) would be good | 22:31 |
*** sarob has quit IRC | 22:31 | |
BobBall | fungi: Is there any way to access a vnc console or similar for VMs in the HP cloud? | 22:33 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: add os-loganalyze to gerrit & zuul https://review.openstack.org/51299 | 22:33 |
sdague | fungi: oops, thanks | 22:33 |
*** changbl has quit IRC | 22:35 | |
fungi | lifeless: if it's decided that an e-mail list of tripleo atcs is warranted, i can generate one on whatever set of repositories and timeframe you want, basically same as we would for a tripleo ptl election | 22:36 |
*** rcleere has quit IRC | 22:36 | |
jeblair | lifeless recommend -dev for now as i'd want to carefully consider giving out email addrs | 22:36 |
fungi | agreed. i'm hesitant as well, but it's a technical possibility | 22:37 |
lifeless | jeblair: ack | 22:37 |
jeblair | i personally think this is a fine use, but i dont want to surprise anyone or break any implied trusts | 22:37 |
fungi | (and you can always just scrape the git commit logs, but that's got the same privacy concerns) | 22:37 |
fungi | BobBall: i believe so, but it's been a while since i needed console access to am hpcloud vm | 22:38 |
jeblair | so lets separately come up with some policy for the future | 22:38 |
sdague | fungi: can I get another look from you on the os_loganalyze add - https://review.openstack.org/51299 ? | 22:38 |
*** datsun180b has quit IRC | 22:38 | |
fungi | sdague: yep, was about to pull it back up | 22:39 |
BobBall | fungi: any ideas how I might do that? the web interface doesn't seem to give me a clue... | 22:39 |
sdague | coolio | 22:39 |
*** CaptTofu has quit IRC | 22:39 | |
fungi | sdague: keep in mind i only -1'd you to game my review positivity stats ;) | 22:39 |
sdague | :) | 22:40 |
*** CaptTofu has joined #openstack-infra | 22:40 | |
clarkb | BobBall I am not sure you can. I had the same problem last I tried | 22:40 |
fungi | that's sucky | 22:41 |
openstackgerrit | Dan Nguyen proposed a change to openstack/requirements: Add pwtools to requirements for password generator https://review.openstack.org/51068 | 22:41 |
* BobBall sighs deeply | 22:41 | |
BobBall | that's a real shame... | 22:41 |
lifeless | jeblair: cool, thanks | 22:41 |
fungi | on the other hand, it seems like a good chunk of nova denial of service issues were related to novnc, so maybe disallowing access there is a defensive measure | 22:42 |
* BobBall bangs his head against the soft fluffy HP cloud | 22:42 | |
lifeless | BobBall: oh? | 22:42 |
BobBall | Struggling trying to get Xen booting nested so we can look at gating tests... and the lack of VNC access means I can't play with boot parameters - once I set them, and it fails, I have to reinstall the machine | 22:43 |
BobBall | it's a right pain | 22:43 |
*** CaptTofu has quit IRC | 22:44 | |
lifeless | BobBall: oh :) | 22:44 |
fungi | i know rackspace provides a console. on the down side the reason i know that is because of having to frequently try to troubleshoot crashed/hung/dead virtual machines | 22:44 |
lifeless | BobBall: erm, I meant oh :( | 22:44 |
lifeless | BobBall: do you have xen booting locally using kvm ? | 22:44 |
lifeless | BobBall: could you just upload a custom image? | 22:44 |
BobBall | we've had it working, yes | 22:44 |
fungi | lifeless: via that awesome glance service they offer their customers ;) | 22:45 |
lifeless | fungi: yup, we have that | 22:45 |
BobBall | not seen that upload a custom image? | 22:45 |
fungi | lifeless: is it no longer in beta? | 22:45 |
lifeless | fungi: it's in public beta still I believe | 22:45 |
jerryz | jgriffith: can i file a bug? | 22:46 |
fungi | well, public beta is way better than secret beta. that's something rackspace still hasn't provided | 22:46 |
BobBall | lifeless: how would I do that? | 22:46 |
jgriffith | jerryz: the bug that you pointed to is valid. Just need to add cinder and possibly glance but not sure yet | 22:46 |
jgriffith | jerryz: I'll have to get back to it here when I have some more time | 22:47 |
BobBall | fungi: RS cloud is even less fun - in theory it's doable but in practice we need an HVM linux guest which is a pain to get hold of with RS cloud :P | 22:47 |
* fungi nods | 22:47 | |
jgriffith | jerryz: feel free to add Cinder to the projects, I don't think it's an infra bug that's for sure | 22:47 |
BobBall | this is the joy of nested virt... | 22:48 |
lifeless | BobBall: hardware assisted virt will be disabled in the kvm vms though surely | 22:48 |
lifeless | BobBall: go to https://account.hpcloud.com/services | 22:49 |
lifeless | BobBall: select us east in the beta section and request access | 22:49 |
lifeless | BobBall: then once you get that, you can ask for glance access too | 22:50 |
BobBall | great, thanks lifeless | 22:50 |
lifeless | BobBall: it was about 24 hour turnaround when I got it enabled on the -infra account | 22:50 |
lifeless | though I don't think they've done anything with it:P | 22:50 |
BobBall | beta request sent :) | 22:50 |
lifeless | BobBall: I'd be delighted to help you get a physical test environment up, if you guys have machines - we should be able to use nova baremetal + nodepool to get you d-g style instances of actual xen deployed pretty easily | 22:52 |
BobBall | we do - although not nearly the number of machines that -infra use for the gate :) | 22:54 |
sdague | clarkb, jeblair: either of you good with putting this through https://review.openstack.org/#/c/51299/ ? then we could get the gerrit core team set, and I can make changes on that side | 22:54 |
BobBall | virtualisation should work - it _really_ should... | 22:55 |
jgriffith | jerryz: cool.. thanks! | 22:56 |
*** dcramer_ has joined #openstack-infra | 22:56 | |
fungi | sdague: clarkb seemed basically okay with the previous patchset in irc. i'm okay approving it and will troubleshoot whatever i might overlook | 22:56 |
sdague | fungi: that would be awesome | 22:57 |
lifeless | BobBall: how many concurrent vm's does a gate run need though? | 22:57 |
sdague | then add me + infra-core to the core team in gerrit | 22:57 |
lifeless | BobBall: say one for d-g itself, and some N concurrent test instances: one solid xen machine should be able to support at least 5 or 6 concurrent d-g style tests. | 22:58 |
lifeless | BobBall: (without slowing each test down, I mean) | 22:58 |
sdague | I'm on for about the next 20 mins | 22:58 |
sdague | then it's off to Plan 9 - http://www.bardavon.org/mobile/event_info.php?id=694 | 22:59 |
BobBall | Perhaps - although I figured we needed one host per VM that's running tests - just to ensure there aren't any cross-interactions which might cause problems? | 23:00 |
BobBall | although maybe I don't understand what d-g style tests are :P | 23:00 |
lifeless | BobBall: d-g runs devstack which you'd want configured to talk to xen | 23:00 |
lifeless | BobBall: I don't know xen well; could you have multiple devstacks talking to one xen ? | 23:00 |
BobBall | in theory, sure | 23:01 |
BobBall | but if you have it then there is a risk of one set of tests interacting with another | 23:01 |
lifeless | k | 23:01 |
*** boris-42 has quit IRC | 23:01 | |
BobBall | e.g. if you break the xenserver in a horrible way (or the plugins don't match...) then it might show up as a failure when it shouldn't have | 23:01 |
lifeless | perhaps have it just run nova gates? | 23:01 |
BobBall | That'd be easier for sure | 23:01 |
BobBall | so how many hosts do you think might be needed? | 23:02 |
openstackgerrit | A change was merged to openstack-infra/config: add os-loganalyze to gerrit & zuul https://review.openstack.org/51299 | 23:02 |
lifeless | nova is a pretty big fraction of the changes | 23:03 |
lifeless | but | 23:03 |
lifeless | I don't have a gut feel - clarkb / fungi may well | 23:03 |
*** senk has joined #openstack-infra | 23:03 | |
lifeless | the full gate, remembering my back of envelope figures | 23:04 |
lifeless | was 400 changes in one day | 23:04 |
lifeless | at 30m each | 23:04 |
BobBall | I see | 23:05 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Change test_queries from logical AND to OR https://review.openstack.org/50160 | 23:05 |
sdague | fungi: so now that it's merged, we just want for the next puppet update to trigger the import? | 23:06 |
BobBall | oh rubbish - just realised it's midnight | 23:06 |
BobBall | I really should get some sleep | 23:06 |
lifeless | BobBall: https://etherpad.openstack.org/tripleo-test-cluster | 23:06 |
lifeless | BobBall: we figureed 40 concurrent test environments is sufficient | 23:06 |
lifeless | BobBall: so 40 small machines for xen | 23:06 |
fungi | sdague: yup, and then i'll add you as the initial core group member, and add the infra core group as included | 23:06 |
sdague | fungi: cool | 23:06 |
lifeless | BobBall: perhaps a moonshot chassis fully loaded? | 23:07 |
BobBall | that'd be a very nice way to do it | 23:08 |
*** senk has quit IRC | 23:08 | |
fungi | lifeless: BobBall: if you're just talking about gating load, have a look at http://status.openstack.org/zuul/ and note that each job listed for a change is using an 8gb vm with 4x vcpu | 23:09 |
lifeless | fungi: moonshot is dual core + hyperthreads with 8GB | 23:09 |
fungi | so depending on the project you're gating, maybe around 10ish servers in parallel | 23:09 |
fungi | lifeless: sounds comparable | 23:10 |
lifeless | fungi: right, it's why I suggested it. | 23:10 |
fungi | is that the arm hardfloat version or the atom one? | 23:10 |
lifeless | fungi: there will be higher density cartridges in future, of course | 23:10 |
BobBall | 10 doesn't sound enough to me if I'm honest | 23:10 |
lifeless | fungi: atom, it even has VTx | 23:10 |
fungi | BobBall: i meant 10ish per change you want to test in parallel | 23:10 |
BobBall | I'm very tempted by the moonshot idea | 23:11 |
lifeless | BobBall: fungi means 10 * - 10 servers per commit, but I think he's wrong :) | 23:11 |
BobBall | oh I see | 23:11 |
BobBall | why 10 per commit? | 23:11 |
fungi | i may be. checking the veracity of my assertion now | 23:11 |
lifeless | fungi: do you mean 'tempest runs 10 sub-vm's ? | 23:11 |
lifeless | fungi: or do you mean 'zuul schedules 10 jobs' ? | 23:11 |
sdague | are you guys talking about devstack/tempest runs? | 23:11 |
sdague | because our experience is the cpu does matter quite a bit | 23:11 |
BobBall | We're talking about adding a devstack/tempest/xenapi run somehow :) | 23:12 |
sdague | which is why the rax nodes aren't used | 23:12 |
sdague | so atom... not a great idea :) | 23:12 |
fungi | just talking about jobs in general. if you were to replicate *all* of our gating, we use 9 virtual machines in parallel for each iteration of attempting to gate a nova change, for example | 23:12 |
lifeless | sdague: mmm, I'd seriously consider native atom over virtualised $other :> | 23:12 |
lifeless | fungi: right, so thats the wrong way to look at it | 23:13 |
BobBall | ahhh ok | 23:13 |
fungi | not sure what metric BobBall was looking for there | 23:13 |
lifeless | fungi: the way to look at is is we're adding one more job to that set. | 23:13 |
fungi | oh, in that case one per change tested in parallel | 23:13 |
lifeless | fungi: so from 9 vm's to 10, one of which BobBall would be providing in a dedicated xen-capable-environment. | 23:13 |
BobBall | I'm not sure I know either :) | 23:13 |
sdague | lifeless: it seems pretty cpu bound, so virtualized doesn't have much overhead | 23:13 |
lifeless | sdague: tempest is running against qemu vm's | 23:14 |
sdague | lifeless: the qemu vm start times isn't really the issue | 23:14 |
fungi | jenkins02 has been gracefully restarted and is coming up now | 23:14 |
*** sarob has joined #openstack-infra | 23:14 | |
lifeless | sdague: ok; I'll defer to data here. | 23:14 |
lifeless | sdague: just that even cirros can't make the vm's do their stuff well > | 23:14 |
lifeless | sdague: I would want to investigate a xen-on-moonshot test before writing it off | 23:15 |
sdague | fair, just saying what I've seen. | 23:15 |
lifeless | sdague: these aren't the atoms most folk have seen | 23:15 |
sdague | ok, well even the amd chips in rax give us a 40% slow down compared to the intel chips at hp | 23:16 |
BobBall | well I'd question whether we'd need to run the full test of tempest tests as well - they all pass of course, so that's not the issue, but some of them are entirely independent of the ypervisor driver | 23:16 |
lifeless | http://www8.hp.com/us/en/products/proliant-servers/product-detail.html?oid=5375897#!tab=specs <- the cartridges I'm referring to | 23:16 |
BobBall | the rax chips you were testing on are a fair bit older than the intel ones at HP though | 23:16 |
sdague | lifeless: what's the L3 look like on those? | 23:16 |
sdague | BobBall: fair | 23:16 |
lifeless | http://ark.intel.com/products/series/71265/Intel-Atom-Processor-S1200-Product-Family-for-Server | 23:16 |
sdague | I'd say get some data on a real system first though | 23:17 |
lifeless | sdague: 1 MB | 23:17 |
lifeless | sdague: yes, +1 on getting real data | 23:17 |
sdague | so, I'd be suspicious then. We've some some pretty strong corolation between L3 size and speed here. | 23:18 |
sdague | but some runs would be good | 23:18 |
BobBall | Do you have access to a moonshot system lifeless? I can probably get access but it's likely to take a while | 23:18 |
sdague | ok, movie time | 23:18 |
lifeless | BobBall: not at the moment, but I know folk who do :/ | 23:18 |
BobBall | okay | 23:18 |
*** fifieldt has joined #openstack-infra | 23:19 | |
fungi | sdague: if it's wood's original plan 9, one of my favorites ;) | 23:19 |
BobBall | okay I'll check with our HP blokey | 23:19 |
lifeless | BobBall: I would suggest, if doing this is a real possibility, that we go in the front door and get a sales person involved - the sales folk have ready access to moonshot for customer evaluations | 23:19 |
lifeless | BobBall: (e.g. fully populated 45 cartridge + two switch chassis) | 23:19 |
BobBall | Maybe. I know someone who has been talking about moonshot so I'll have a few words with him first | 23:20 |
BobBall | and try the glance upload route too :) | 23:21 |
lifeless | cool | 23:21 |
BobBall | all sorts of fun! | 23:21 |
lifeless | if you run into a wall, let me know | 23:21 |
lifeless | I have some interactions with moonshot teams | 23:21 |
BobBall | perfect, thanks. | 23:21 |
BobBall | *sleep* | 23:22 |
*** BobBall is now known as BobBallAway | 23:22 | |
lifeless | gnight! | 23:22 |
*** marktraceur is now known as FreeThaiFood | 23:23 | |
fungi | jeblair: anecdotal but worth watching for next time, we had a great many more devstack jobs end up on jenkins02 as soon as it came up than were running on jenkins01. like it got favored for some reason (maybe accumulated shares from while it was unreachable?) | 23:26 |
fungi | at the moment there are about 5 devstack jobs running on jenkins01 and nearly 50 on jenkins02 | 23:27 |
fungi | but jobs still seem to be running and completing successfully | 23:29 |
fungi | i'll check back in on it in a bit | 23:29 |
*** pentameter has quit IRC | 23:33 | |
*** mriedem has joined #openstack-infra | 23:37 | |
*** nati_uen_ has joined #openstack-infra | 23:45 | |
*** nati_ueno has quit IRC | 23:46 | |
*** FreeThaiFood is now known as marktraceur | 23:48 | |
*** hogepodge has quit IRC | 23:49 | |
*** rnirmal has quit IRC | 23:54 | |
*** vipul is now known as vipul-away | 23:57 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!