anteaya | mestery: and you can vote on the sandbox repo: http://git.openstack.org/cgit/openstack-dev/sandbox/ | 00:00 |
---|---|---|
SpamapS | clarkb: that is what nodepool thinks. The cloud thinks there are no nodes. | 00:00 |
SpamapS | clarkb: we were unreachable from the internet for about 16 hours so that would why they may be out of sync :-P | 00:00 |
clarkb | SpamapS: gotcha. so I should kick it | 00:00 |
fungi | SpamapS: fungi is fed and responding once again | 00:00 |
jeblair | clarkb: kick what? | 00:00 |
SpamapS | fungi: ci wizard needs food | 00:00 |
anteaya | mestery: best case scenario is you have some commenting history on neutron and voting history on some sandbox patches and then apply at a neutron weekly meetin for voting rights | 00:00 |
fungi | catching up nowish | 00:00 |
SpamapS | fungi: tripleo shot the food | 00:00 |
clarkb | gah we made deletes async | 00:01 |
clarkb | jeblair: I was going to delete all the nodes so that new ones would build | 00:01 |
clarkb | jeblair: but that is asnyc so it may not help much | 00:01 |
anteaya | mestery: if markmcclain tells infra change OpenDaylight Jenkins to the voting group, then you are changed in gerrit and you can vote | 00:01 |
jeblair | clarkb: ok. yeah, the ones in delete state should be being deleted. | 00:01 |
jeblair | clarkb: the ones in building will switch over after 8 hours | 00:01 |
clarkb | jeblair: so leave it as is? | 00:02 |
jeblair | clarkb: i'd look into why the deleted nodes aren't deleting | 00:02 |
*** fbo_away is now known as fbo | 00:02 | |
jeblair | (but i don't think anything needs "kicking") | 00:03 |
mordred | dhellmann: what's versionutils? | 00:03 |
*** blamar has quit IRC | 00:03 | |
fungi | SpamapS: from what i hear, someone shot the food | 00:04 |
fungi | SpamapS: oh, you beat me to that joke | 00:04 |
fungi | SpamapS: tripleo warrior is about to die? | 00:04 |
SpamapS | apparently fungi is async too | 00:04 |
marun | seen in an experimental jenkins job that attempts to run tox as the tempest user: mkdir('/opt/stack/new/neutron/.tox',) | 00:04 |
fungi | indeed. scrollback in chronological order | 00:04 |
fungi | caught up now | 00:04 |
marun | sorry, permission error on that command | 00:05 |
clarkb | 2014-03-06 00:04:33,671 DEBUG nodepool.NodePool: Deficit: tripleo-precise: 0 (start: 0 min: 35 ready: 35) | 00:05 |
clarkb | I think that means that 35 nodes are 'ready' but I don't see them | 00:05 |
jeblair | clarkb: building counts as ready | 00:05 |
clarkb | jeblair: gotcha | 00:06 |
marun | Should tox be targeting a different directory or is there a user that can be both sudo and write to the neutron path? | 00:06 |
jeblair | clarkb: (otherwise, it would always be building) | 00:06 |
fungi | clarkb: SpamapS: still 98 in a delete state. i'll try again to clear them if the cloud is really up this time, for reals | 00:06 |
SpamapS | I'm fine with "wait 8 hours" ... but I was hoping we'd have some idea if the attempts at new nodes would work in 8 hours | 00:06 |
jeblair | marun: are you asking "how to sudo in unit tests?" | 00:06 |
fungi | reattempting to delete --now the nodes in a delete state in the tripleo cloud | 00:07 |
marun | jeblair: There's already a job that should have sudo privileges. | 00:07 |
marun | jeblair: I'll find the config, hold on. | 00:07 |
SpamapS | fungi: there are no non-template nodes in the tripleo cloud that I can see | 00:07 |
marun | jeblair: https://review.openstack.org/#/c/66967/ | 00:07 |
fungi | SpamapS: this does not surprise me | 00:07 |
clarkb | jeblair: what if we delete the rows in the nodepool db manually? | 00:08 |
clarkb | jeblair: since SpamapS says there is nothing on his end | 00:08 |
SpamapS | we have "monitoring" on the tripleo cloud now... and we think we've solved out hardware and driver issues, so hopefully this will be less of a problem going forward. | 00:08 |
jeblair | marun: okay, not unit tests then. carry on. :) | 00:08 |
fungi | SpamapS: nodepool is still mostly written based on the assumption that clouds don't fall offline, or when they do they at least do so briefly | 00:08 |
SpamapS | 16 hours isn't brief? | 00:09 |
SpamapS | as Ng says, we have nine fives of uptime. | 00:09 |
fungi | (and don't lose track of what they had in the process either) | 00:09 |
jeblair | SpamapS: sure it is, and 8 hours to recover is brief too. :) | 00:09 |
marun | jeblair: any thoughts as to how this job could be made to run? It's failing when it tries to run tox because the user running the job doesn't have write access to the neutron dir. | 00:09 |
fungi | SpamapS: that all depends on where you keep your nines | 00:09 |
marun | jeblair: line 28 in https://review.openstack.org/#/c/66967/1/modules/openstack_project/files/jenkins_job_builder/config/neutron-functional.yaml | 00:09 |
SpamapS | fungi: read it again. We have _fives_ | 00:09 |
fungi | SpamapS: 50.99999% uptime? ;) | 00:09 |
fungi | ohhhh | 00:10 |
SpamapS | 55.5555555 :) | 00:10 |
Ng | :D | 00:10 |
fungi | NINE FIVES | 00:10 |
fungi | got it | 00:10 |
Ng | best SLA troll ever | 00:10 |
jeblair | marun: you may want to look at the devstack-gate script. briefly; the jenkins user has sudo access. | 00:10 |
jeblair | marun: i can't dig deeply into it with you right now, sorry. | 00:10 |
marun | jeblair: fair enough | 00:10 |
kevinbenton | with the random failures of tempest in the gate, has there been any discussion of using more nodes to test the top faster (e.g. divide tempest run list across several nodes) rather than building up a long chain of tested patches that has a high probability of being broken anyway? | 00:10 |
*** dims has joined #openstack-infra | 00:10 | |
*** rpodolyaka has joined #openstack-infra | 00:11 | |
*** rfolco has joined #openstack-infra | 00:11 | |
jeblair | marun: i _think_ that means that you should be able to "sudo foo" from any of the hooks and it should work | 00:11 |
fungi | kevinbenton: the quixotiists amongst us would rather tilt at the windmills which might increase the number of changes which pass tests | 00:11 |
* fungi is a quixotiist | 00:12 | |
*** cody-somerville has quit IRC | 00:12 | |
StevenK | Are openstack/melange and openstack/python-melangeclient supposed to be un-clone-able ? | 00:12 |
fungi | quixoticist? | 00:12 |
marun | jeblair: it's not even getting to the point of invoking anything sudo, though. tox fails trying to create /opt/stack/new/neutron/.tox | 00:13 |
clarkb | kevinbenton: wouldn't that ignore the fails | 00:13 |
clarkb | kevinbenton: it is important to remember that these failures are real bugs | 00:13 |
marun | jeblair: anyway, i'll figure it out | 00:13 |
fungi | StevenK: they're supposed to be pretty broken and ancient cruft nobody uses any longer | 00:13 |
kevinbenton | clarkb: no, it would just detect them faster | 00:13 |
kevinbenton | fungi: i understand :-) | 00:13 |
StevenK | fungi: They will disappear from ls-projects at some point, then? | 00:13 |
*** Ryan_Lane has quit IRC | 00:13 | |
fungi | StevenK: we do not delete history | 00:14 |
clarkb | kevinbenton: not necessarily as we would lose intertest interactions | 00:14 |
kevinbenton | clarkb: oh i see. is that the source of some of the bugs? | 00:14 |
anteaya | kevinbenton: multiple bug sources | 00:14 |
fungi | StevenK: though if they're unclonable, that might already be having the same effect | 00:14 |
fungi | StevenK: i'll test | 00:14 |
anteaya | races, merge optimizations, host optimizations | 00:15 |
clarkb | kevinbenton: yes I think large portions of them are nova did this thing then later stuff goes ugh | 00:15 |
fungi | StevenK: yep, pretty darn broken | 00:15 |
kevinbenton | clarkb, anteaya: i see. i was imagining a node for each high-level tempest group or something along those lines | 00:16 |
anteaya | 5 just merged | 00:16 |
fungi | StevenK: checking into why | 00:16 |
anteaya | kevinbenton: how is that different from what we are currently doing | 00:16 |
anteaya | we have a node for each running test job | 00:16 |
kevinbenton | anteaya: broken down slightly further. like one that runs compute tests, one runs network tests, one runs volume tests, etc | 00:17 |
*** fbo is now known as fbo_away | 00:17 | |
*** alexpilotti_ has joined #openstack-infra | 00:17 | |
StevenK | fungi: I suppose a better question would have been why can't I clone melange and its client | 00:17 |
fungi | StevenK: oh... i see why | 00:18 |
fungi | StevenK: we have gerrit configured to replicate it everywhere, but our git server farm only creates repositories listed in http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/review.projects.yaml | 00:19 |
fungi | StevenK: so you can clone from http://review.openstack.org/p/openstack/melange.git but not from http://git.openstack.org/openstack/melange.git | 00:19 |
*** alexpilotti has quit IRC | 00:19 | |
*** Ryan_Lane has joined #openstack-infra | 00:20 | |
fungi | StevenK: i agree we should fix that in one way or another | 00:20 |
*** andreaf has quit IRC | 00:20 | |
StevenK | fungi: Hmmmm. Those two things do seem at odds. | 00:20 |
*** sandywalsh has quit IRC | 00:21 | |
fungi | StevenK: please file a bug against https://bugs.launchpad.net/openstack-ci/+filebug | 00:21 |
fungi | StevenK: there are a couple of possible ways to solve this, but it's worth some debate probably | 00:21 |
*** alexpilotti_ has quit IRC | 00:21 | |
fungi | StevenK: the main thing i dislike about the current situation is that we're mirroring repositories to github which we're not serving from our own git server farm | 00:23 |
*** markwash has quit IRC | 00:23 | |
StevenK | Yeah | 00:23 |
fungi | and that sends entirely the wrong message in my opinion | 00:23 |
jeblair | fungi: github is where the crufty stuff is? :) | 00:23 |
fungi | jeblair: let's stop mirroring current projects to github and let them host our abandoned refuse ;) | 00:24 |
StevenK | fungi: https://bugs.launchpad.net/openstack-ci/+bug/1288485 | 00:24 |
fungi | StevenK: thanks! | 00:24 |
*** mrodden has joined #openstack-infra | 00:24 | |
*** markmcclain has quit IRC | 00:25 | |
fungi | StevenK: great summary | 00:25 |
jeblair | fungi: i'd really like to get gerritbot working.. | 00:25 |
jeblair | fungi: i don't see ServerNotConnectedError in any logs except todays | 00:25 |
fungi | jeblair: i'm happy to work on that now | 00:25 |
jeblair | fungi: my inclination is to revert the change i made. at least, if i'm charged with fixing it, that's the first thing i'd do | 00:26 |
jeblair | fungi: because i just want it to work right now, i'm chasing too many other issues to open a new one... | 00:26 |
fungi | poll... downgrade irclib, release and upgrade gerritbot, or restart it and wait for it to (probably) fail again? | 00:26 |
fungi | i'm fine with backing down irclib | 00:26 |
*** packet has quit IRC | 00:26 | |
kevinbenton | anteaya: the reasoning being that with a non-negligible probability of each job failing, it's better to focus more resources towards the top of the gate to fail faster rather than the current depth where a failure wastes at least the same amount of compute resources | 00:26 |
*** mgagne has joined #openstack-infra | 00:26 | |
jeblair | fungi: but if someone else really wants to jump down the rabbithole, i'm not objecting... | 00:26 |
jeblair | i just think we should do one of the other soon. :) | 00:27 |
fungi | jeblair: i was fine with it while it sounded like clarkb was volunteering ;) | 00:27 |
jeblair | i'm volunteering to downgrade irclib since i broke it by upgrading | 00:27 |
*** yamahata has joined #openstack-infra | 00:27 | |
fungi | jeblair: i have absolutely no objections | 00:27 |
jeblair | fungi, clarkb: so one of you pre-empt me now if you want to do something else. :) | 00:27 |
fungi | i had not done a time correlation on the traceback occurrence, and agree that it sounds related to the irclib upgrade | 00:28 |
clarkb | jeblair: fungi: I am ok with it to | 00:28 |
clarkb | jeblair: was e-r involved as well? it is having trouble but doesn't seem to be ircbot related | 00:28 |
jeblair | clarkb: it's online now | 00:28 |
fungi | clarkb: we probably didn't upgrade irclib on status.o.o? (does e-rbot use irclib?) | 00:29 |
clarkb | fungi: I think it does | 00:29 |
fungi | hmm... maybe then | 00:29 |
jeblair | it actually should be pretty new anyway on that host | 00:30 |
clarkb | yeah it probably isn't related. looks like jogo just found a TypeError somewhere | 00:30 |
fungi | right now pip freeze claims irc==8.5.4 on both systems | 00:30 |
*** vkozhukalov has joined #openstack-infra | 00:32 | |
dansmith | clarkb: do we have any plotting of runtimes of nova unit tests? | 00:32 |
pleia2 | fungi: commented on bug 1288485 - if we can do a "sync once" of old projects that won't need to be updated (since they are old, and not updated anymore) one of my solutions on the git server side should work fine | 00:32 |
dansmith | clarkb: trying to debug something I'm seeing only in the check queue that is on something threading/event/timeout related and wanted to see if it's regularly taking longer (timing out) than other things | 00:33 |
*** sabari3 has quit IRC | 00:33 | |
clarkb | dansmith: no, but sdague has a thing that could be used for that | 00:33 |
dansmith | okay | 00:33 |
*** sabari has joined #openstack-infra | 00:33 | |
fungi | pleia2: maybe. i did just rebuild all the git servers this week and relied on create-cgitrepos to take care of all that for me (worked great by the way) | 00:33 |
fungi | pleia2: i'd be a little hesitant to add more manual steps | 00:34 |
pleia2 | fungi: ah yeah, true for rebuilding | 00:34 |
*** denis_makogon has quit IRC | 00:34 | |
fungi | pleia2: for reference, the (simplified) replacement steps look like https://etherpad.openstack.org/p/git-server-rebuild | 00:34 |
pleia2 | fungi: oy | 00:34 |
*** bhuvan has joined #openstack-infra | 00:34 | |
*** bhuvan_ has joined #openstack-infra | 00:34 | |
fungi | pleia2: it was easier to use a cut-and-paste document so i could do it drunk^Weasily | 00:35 |
pleia2 | fungi: so if we create a review.projects.old.yaml to add old stuff to we can edit create-cgitrepos to create the old config too | 00:35 |
pleia2 | I don't remember how the gerrit side of syncing it all works | 00:36 |
fungi | pleia2: i think the simple solution is to just go ahead and add the couple of missing/old projects to review.projects.yaml and stop worrying | 00:37 |
pleia2 | yeah | 00:37 |
fungi | whether those projects appear in one yaml file or separate files isn't really making a substantive statement about their relative viability | 00:38 |
fungi | so the least complicated solution is to fix the list and move on | 00:38 |
pleia2 | is there a good reason to remove them from that yaml file? | 00:38 |
fungi | i don't really see leaving them out of that file as accomplishing anything, no | 00:38 |
* pleia2 nods | 00:38 | |
*** bhuvan___ has quit IRC | 00:39 | |
*** bhuvan__ has quit IRC | 00:39 | |
*** bhuvan_ has quit IRC | 00:39 | |
*** bhuvan has quit IRC | 00:39 | |
*** jnoller has joined #openstack-infra | 00:39 | |
*** MarkAtwood has quit IRC | 00:40 | |
fungi | if we want to make a dead projects boneyard down the road, well, we can actually do something to implement that separation in a sane way | 00:40 |
fungi | but until we do, it's my opinion that there's a lot less work involved in not caring | 00:40 |
*** openstackgerrit has joined #openstack-infra | 00:40 | |
jeblair | irc for openstackgerrit is downgraded | 00:41 |
fungi | jeblair: thanks! | 00:41 |
StevenK | fungi: jogo votes for openstack-boneyard | 00:41 |
pleia2 | hehe | 00:41 |
jeblair | np, sorry for that. maybe we should just install the new version on eavesdrop.o.o and leave that one alone | 00:42 |
pleia2 | jeblair: thanks for trying to fix it :) | 00:42 |
fungi | pleia2: and if someone proposes a patch to add them to that file for now (with whatever scary admonishing description seem appropriate), i won't hesitate to +2 that | 00:42 |
pleia2 | any bright ideas on how to get a list of "all of them"? | 00:42 |
jeblair | pleia2: i'm pretty sure the new version will work, i just don't want to find out what _won't_ work about it right now | 00:42 |
fungi | pleia2: gerrit ls-projects | 00:42 |
*** bhuvan has joined #openstack-infra | 00:43 | |
*** bhuvan_ has joined #openstack-infra | 00:43 | |
fungi | pleia2: and if that was vague, i meant 'ssh -p 29418 review.openstack.org gerrit ls-projects' | 00:44 |
dansmith | clarkb: do you know if sdague has a dansmith-is-an-idiot detector? | 00:44 |
pleia2 | fungi: no worries, I got that :) | 00:44 |
fungi | dansmith: come closer and we'll find out | 00:44 |
dansmith | hehe | 00:44 |
*** adrian_otto has quit IRC | 00:44 | |
*** sarob_ has quit IRC | 00:45 | |
*** sarob_ has joined #openstack-infra | 00:45 | |
*** hogepodge has quit IRC | 00:45 | |
*** amcrn has quit IRC | 00:46 | |
fungi | i are mailing list fail. it's now march 6th utc and i've found time to skim and delete threads starting on the -dev ml up to february 27th now | 00:46 |
pleia2 | nice | 00:47 |
fungi | i'm weighing declaring ml bankruptcy against knocking off for the evening and hoping tomorrow finds me somehow less busy | 00:47 |
*** dprince has quit IRC | 00:48 | |
fungi | i'll do some code review first so i don't feel completely useless | 00:48 |
*** MarkAtwood has joined #openstack-infra | 00:48 | |
*** zhiwei has joined #openstack-infra | 00:48 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add a script to manage IRC perms https://review.openstack.org/78483 | 00:49 |
*** derekh has quit IRC | 00:49 | |
jeblair | btw, i ran that for real on the channels in the config file | 00:49 |
fungi | that one ^ was on the priority list! ;) | 00:49 |
*** wchrisj has quit IRC | 00:49 | |
*** sarob_ has quit IRC | 00:49 | |
jeblair | fungi: that's a new one -- it's the followup that actually makes changes | 00:49 |
fungi | okay, both are on the priority list in that case | 00:50 |
*** sarob_ has joined #openstack-infra | 00:50 | |
jeblair | next up is add 50 more channels to the config and then finally normalize everything | 00:50 |
*** wchrisj has joined #openstack-infra | 00:50 | |
jeblair | reed: ttx: you two are about to get ops on all openstack channels. anyone else you think should be a global op (to deal with spammers, etc?) | 00:50 |
jeblair | (by about to, i mean within the next few days maybe) | 00:51 |
StevenK | jeblair: May I suggest lifeless for the other side of the world problem? | 00:51 |
*** jcoufal has quit IRC | 00:52 | |
fungi | StevenK: has a point... ttx and reed and infra give us okay coverage over one hemisphere, roughly | 00:52 |
fungi | at least somebody in apac would probably be a good addition | 00:53 |
*** rpodolyaka has quit IRC | 00:53 | |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Add query for neutron db migration conflicts. https://review.openstack.org/78435 | 00:54 |
*** MarkAtwood has quit IRC | 00:54 | |
fungi | i have no idea if lifeless has time to be (or interest in being) irc police however | 00:54 |
StevenK | fungi: lifeless already has experience with #ubuntu-* ops | 00:55 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/elastic-recheck: Don't get loggers until after log setup https://review.openstack.org/78485 | 00:55 |
StevenK | Which I why I put him forward | 00:55 |
StevenK | s/Which I/Which is/ | 00:55 |
fungi | StevenK: might be all the more reason he'd want to decline the honor ;) | 00:55 |
StevenK | fungi: Possibly. | 00:56 |
StevenK | fungi: ITYM 'honour'? :-P | 00:56 |
anteaya | kevinbenton: now I understand the design you are proposing better | 00:56 |
fungi | s/honor/responsibility/ | 00:56 |
StevenK | Bwahah | 00:56 |
* StevenK tries to add a 'u' to responsibility, fails | 00:56 | |
anteaya | kevinbenton: I'm stuck on what you mean by non-negligibe probablity of each job failing | 00:56 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Use the correct qualname for recheckwatchbot https://review.openstack.org/78486 | 00:57 |
fungi | StevenK: just pronounciate it with your best paul hogan impersonation | 00:57 |
clarkb | jogo: ^ and that will fix the other issue I have found | 00:57 |
StevenK | fungi: "That's not a knife!" | 00:57 |
*** bhuvan__ has joined #openstack-infra | 00:58 | |
*** bhuvan___ has joined #openstack-infra | 00:58 | |
jeblair | clarkb: any chance we can do https://review.openstack.org/78485 as a revert? | 00:58 |
*** vkozhukalov has quit IRC | 00:58 | |
clarkb | jeblair: looking | 00:58 |
jeblair | clarkb: https://review.openstack.org/#/c/66564/ is the change | 00:58 |
clarkb | jeblair: maybe? was that inserted in a weird way? | 00:58 |
clarkb | jeblair: thanks | 00:58 |
kevinbenton | anteaya: i mean that there is still a decent chance that a job will fail on something like check-tempest-dsvm-neutron-isolated | 00:58 |
fungi | even just fashioning it as a revert would make the situation a little more clear, i think | 00:59 |
clarkb | jeblair: sort of, there are places in that that are broken too | 00:59 |
jeblair | clarkb: oh? | 00:59 |
clarkb | jeblair: and the LOG object has been cargo culted into other places | 00:59 |
lifeless | I am ok with beig an IRC op | 00:59 |
clarkb | jeblair: the class members for log load at import time too | 00:59 |
*** sarob__ has joined #openstack-infra | 01:00 | |
jeblair | clarkb: i'm fine with it being an instance var, but still, why is that a problem? | 01:00 |
clarkb | jeblair: because incremental loading of logging configs doesn't work well | 01:00 |
clarkb | jeblair: so we have to make sure that setup_logging executes before we grab the loggers | 01:00 |
clarkb | also I love the commit message on 66564 | 01:01 |
*** bhuvan_ has quit IRC | 01:01 | |
*** prad has joined #openstack-infra | 01:01 | |
jeblair | clarkb: ok, i got that... i guess i'm confused as to why we've never seen an issue with class-level vars | 01:01 |
*** wenlock has quit IRC | 01:01 | |
clarkb | jeblair: hrm, I may be misreading the python logging docs | 01:01 |
*** yamahata_ has quit IRC | 01:01 | |
clarkb | in which case I don't know why the module level LOG isn't working | 01:01 |
lifeless | fungi: any chance you can peek at nodepool logs? | 01:02 |
clarkb | but that may be explained by https://review.openstack.org/78486 | 01:02 |
*** yamahata_ has joined #openstack-infra | 01:02 | |
*** bhuvan has quit IRC | 01:02 | |
jeblair | clarkb: basically, yeah, i thought that class level vars were a solution to the problem you are describing | 01:02 |
*** zhiwei has quit IRC | 01:02 | |
*** sarob_ has quit IRC | 01:02 | |
fungi | lifeless: which nodepool logs. image logs or operating logs? | 01:02 |
clarkb | jeblair: oh so it is a problem at module level but not class level? | 01:02 |
anteaya | kevinbenton: isolated jobs for neutron have been removed: https://review.openstack.org/#/c/71947/ | 01:02 |
jeblair | clarkb: that is a dupe of https://review.openstack.org/#/c/77629/ | 01:02 |
lifeless | fungi: well there is a template thats been building for some time; and we don't have any tripleo slaves | 01:02 |
clarkb | jeblair: thanks | 01:03 |
*** bhuvan___ has quit IRC | 01:03 | |
*** bhuvan__ has quit IRC | 01:03 | |
clarkb | jeblair: clearly I should've asked you about e-r logging before I debugged :) | 01:03 |
openstackgerrit | Elizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add back old projects to replicate to git.o.o https://review.openstack.org/78490 | 01:03 |
pleia2 | fungi: that do? ^ | 01:03 |
jeblair | clarkb: call before you dig? | 01:03 |
clarkb | jeblair: indeed | 01:03 |
clarkb | there could be danger below | 01:03 |
pleia2 | (fortunately it is only those two that are in ls-projects but not on git.o.o) | 01:04 |
lifeless | fungi: so there are arguably two issues we'd like to fix, SpamapS and I are hanging around in the hope we'll get it fixed | 01:04 |
fungi | lifeless: it'll be faster for me to sift through the logs. looking now | 01:04 |
kevinbenton | anteaya: whoops. gate-tempest-dsvm-neutron-pg was the one that died on me this morning | 01:04 |
*** sarob__ has quit IRC | 01:05 | |
anteaya | kevinbenton: on what failure? can I see the logs? | 01:05 |
clarkb | jeblair: I am still trying to wrap my head around why class level loggers work | 01:05 |
kevinbenton | anteaya: http://logs.openstack.org/75/73575/20/gate/gate-tempest-dsvm-neutron-pg/595632b/ | 01:05 |
*** SumitNaiksatam has quit IRC | 01:05 | |
clarkb | jeblair: those statements are executed at import time just like the module level loggers right? | 01:05 |
*** SumitNaiksatam has joined #openstack-infra | 01:05 | |
jeblair | clarkb: i'm curious too, but i think i need to forage for food now | 01:05 |
kevinbenton | anteaya: looks like the cirros ssh timeout bug | 01:06 |
clarkb | jeblair: but I think a revert would be cleaner then a second commit to do additional cleanup in the cargo cult areas | 01:06 |
jeblair | fungi: heads up, i talked to someone at rax today who has a team of folks wanting to build a gerrit replacement on openstack principles and in an open manner like storyboard; he should be posting to the ml soon | 01:07 |
jeblair | clarkb: ^ | 01:07 |
clarkb | jeblair: I am not sure what to think of that | 01:07 |
jeblair | i think it's pretty exciting | 01:07 |
clarkb | on one hand yay sane upstream. on the other this thing over here works | 01:07 |
fungi | jeblair: awesome--they approached me a few days ago as well and i told them "mailing list" too | 01:07 |
fungi | jeblair: but sounded exciting! | 01:07 |
clarkb | gerrit does not have the same sort of problems that the bug tracker world have | 01:08 |
*** atiwari has quit IRC | 01:08 | |
*** sabari has quit IRC | 01:08 | |
jeblair | clarkb: it has a different set... we've certainly had our issues with it... | 01:08 |
fungi | jeblair: i started to point them at storyboard as an example of a related/parallel effort and was please to discover they were already aware and patterning some of their execution plan after that (and wanted to see them integrate well) | 01:08 |
clarkb | jeblair: right but the set it has don't necessarily scream fork to me | 01:09 |
clarkb | or even reinvent | 01:09 |
* clarkb points at our effort to stop running a fork as evidence | 01:09 | |
fungi | clarkb: i get the impression that one of the reasons it's happening is that a code review system was proposed internally, gerrit was held up as best-of-breed and some one (or ones) panned it because it was written in java | 01:10 |
anteaya | kevinbenton: yes also it ran on hpcloud-az2, which is displaying some very difficult behaviours around image builds | 01:10 |
*** mwagner_lap has joined #openstack-infra | 01:10 | |
kevinbenton | anteaya: so what i'm getting at is if every job has something like a 1/6 chance of dying, a job 6 deep in the queue has like a 66% of failing or having one of its parents failing | 01:10 |
jeblair | clarkb: i'm sure you'll admit we've have considerable trouble getting our patches in upstream. | 01:10 |
clarkb | jeblair: yup no argument there | 01:10 |
anteaya | kevinbenton: but why is it dying? | 01:10 |
kevinbenton | anteaya: ah, maybe i'm overestimating the failures like these timeouts then | 01:11 |
clarkb | kevinbenton: way overestimating | 01:11 |
anteaya | kevinbenton: the reasons ro the test failures are myriad and moving | 01:11 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/storyboard: Handle yaml files updates https://review.openstack.org/78491 | 01:11 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/storyboard: Make project description longer https://review.openstack.org/78492 | 01:11 |
clarkb | kevinbenton: we are really stable right now oddly | 01:11 |
clarkb | usually a bunch of broken happens during feature freeze | 01:11 |
anteaya | due to load | 01:12 |
anteaya | flushing out bugs we never knew we had, because load | 01:12 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Load storyboard projects from projects.yaml https://review.openstack.org/78249 | 01:12 |
jeblair | clarkb: anyway, i think the case for this will come out once we start looking at it. i mostly wanted to convey that i thought they way they want to approach it is really good. | 01:12 |
clarkb | jeblair: I agree that the approach is good | 01:12 |
kevinbenton | clarkb, anteya: is there anyway to see stats on the current ratio of rejects to accepts? | 01:12 |
clarkb | jeblair: and like the idea of an actually responsive upstream for a code review tool | 01:13 |
anteaya | kevinbenton: I'm not saying that talking about improvements isn't a good idea, we welcome those discussions | 01:13 |
clarkb | especially when you consider stuff like change screen 2 | 01:13 |
clarkb | but, there are a bunch of tools out there including gerrit and they work supposedly | 01:13 |
*** Ryan_Lane has quit IRC | 01:13 | |
dstufft | if killing gerrit means there is a review api that doesn't open a tab for each file I'm for it | 01:13 |
dstufft | ui | 01:13 |
dstufft | not api | 01:13 |
anteaya | kevinbenton: you might like http://graphite.openstack.org/ | 01:13 |
dstufft | I'm dumb today | 01:13 |
anteaya | I wish I knew how to use it better | 01:14 |
clarkb | dstufft: its ok, so am I | 01:14 |
jeblair | dstufft: someone wrote a patch for that but it didn't get included because of the gerrit CLA. | 01:14 |
clarkb | dstufft: we could start a support group | 01:14 |
anteaya | kevinbenton: jeblair sdague fungi clarkb and jogo make the nicest graphs I see | 01:14 |
jeblair | anyway, food now. | 01:14 |
* fungi thinks we're throwing stones in glass houses complaining about a cla ;) | 01:14 | |
jeblair | fungi: i complain about ours more than anyone elses. | 01:15 |
dstufft | jeblair: a patch that isn't applied is pretty useless to me unless I convince people to apply said patch :D | 01:15 |
anteaya | kevinbenton: maybe on an after ff day like next week one of them might be able to make some suggestions | 01:15 |
StevenK | jogo: But this does? | 01:15 |
fungi | jeblair: fair point! | 01:15 |
jogo | silly StevenK | 01:15 |
dstufft | I never understood why people care about a CLA | 01:15 |
dstufft | tbh | 01:15 |
jeblair | dstufft: it won't apply now. it's too ald. i mostly wanted you to have another piece of anecdotal evidence that CLAs are bad for free software projects. i think it's important everyone knows that. :) | 01:15 |
jeblair | this is why i care. | 01:15 |
dstufft | jeblair: why are they bad, because some people won't sign them? ;P | 01:15 |
fungi | (especially people working on free software projects) | 01:16 |
jeblair | dstufft: because some people won't sign them. some people can't sign them. some people need 2 years to get their employers to sign them. | 01:16 |
jeblair | dstufft: all for nothing. | 01:16 |
fungi | dstufft: they complicate things. (probably) entirely unnecessarily | 01:16 |
fungi | though they do make lawyers feel better about themselves | 01:16 |
StevenK | Stuff and other stuff jogo | 01:17 |
anteaya | 13 in the gate and 16 in post | 01:17 |
anteaya | woo | 01:17 |
jeblair | fungi: true, one thing they certainly do is increase billable hours. | 01:17 |
jogo | anteaya: it looked like the neutron stuff was the culprite in the end | 01:17 |
StevenK | Soon to be 10, I think | 01:18 |
fungi | lifeless: unfortunately nodepool is logging when it starts launching a new node in the tripleo cloud, but then basically nothing thereafter... there are 35 nodes it thinks are in a "building" state at this point, but i haven't yet found an explanation for why none of them are transitioning to a ready state. digging deeper | 01:18 |
anteaya | do expand | 01:18 |
anteaya | personally I have been very proud of the way neutron has been addressing issues | 01:18 |
anteaya | they haven't been blocking the gate for others, that I have seen | 01:18 |
anteaya | and they are responsive when asked to address issues | 01:18 |
anteaya | I have been away all day | 01:19 |
anteaya | and have missed the context you are referencing jogo | 01:19 |
anteaya | but am hear now and interested in listening | 01:19 |
*** chuck__ has joined #openstack-infra | 01:21 | |
dstufft | jeblair: fungi well I don't know much, but VanL says Python needs one and I trust VanL :) | 01:21 |
*** jcooley_ has quit IRC | 01:23 | |
*** rpodolyaka has joined #openstack-infra | 01:24 | |
fungi | SpamapS: lifeless: so this seems to be the situation... nodepool is configured to keep at least 35 nodes from tripleo on hand, and currently doesn't see a demand for more than that. it *thinks* it's building 35 currently (the oldest has been building for over 7 hours and the youngest for right at 1 hour). it won't try building more until one of those reaches the 8 hour timeout or the demand for nodes | 01:25 |
fungi | grows past 35. and it doesn't have any way to force building nodes to a delete state other than the 8 hour timeout or hearing back from the provider that the build failed | 01:25 |
* anteaya makes tea and hopes jogo returns because she would like to know what he is talking about | 01:25 | |
fungi | SpamapS: lifeless: how recently should it have actually started working? | 01:26 |
StevenK | fungi: So is it getting 404's, or is it actually talking correctly? | 01:27 |
StevenK | fungi: Since the tripleo cloud says it has no nodes | 01:28 |
fungi | StevenK: i see no evidence it's getting a 404. it may not be getting a completion or socket closure from the nova boot call... i'll see how many established sockets we have to the api endpoint | 01:28 |
fungi | hmm... one established socket to 138.35.77.16:13000 but i guess it reuses an open socket so that was no help | 01:29 |
mordred | dstufft: lawyers tend to say that people need things that make lawyers more money | 01:30 |
clarkb | jogo: so I ahve WIP'd my er change because I think we should revert sdague's change so that it is clear that it needs to be one way and not the other | 01:30 |
mordred | sdague: it's hard to remember some times that lawyers work for us and not the other way aroudn | 01:30 |
mordred | gah | 01:30 |
mordred | dstufft, not sdague | 01:30 |
*** dkliban has quit IRC | 01:31 | |
fungi | mordred: you only say this because you're married to a law major | 01:31 |
*** jnoller has quit IRC | 01:31 | |
kevinbenton | anteaya: thanks for the link | 01:31 |
anteaya | kevinbenton: np | 01:31 |
dstufft | mordred: well IANAL but it seems to me the case against CLA hinges on an implicit license of some contribution just because the project that the patch was agaisnt has a particular license | 01:31 |
dstufft | I don't think you can reasonably assert that legally though :/ | 01:32 |
anteaya | and you bring good thoughts, just hard to find fertile ground for gate redesign today | 01:32 |
anteaya | weary and all | 01:32 |
*** jcooley_ has joined #openstack-infra | 01:32 | |
StevenK | fungi: Apparently, there is a template build underway, can you talk to the node that is doing that? | 01:32 |
StevenK | fungi: ssh, ping, etc | 01:33 |
fungi | StevenK: nodepool does not believe that it is currently building a template. it knows about two ready images for which the template server is assumed to be long gone | 01:34 |
fungi | StevenK: the two images it's aware of are from roughly 3 and 7 days ago respectively (it tries to build new images nightly, but the persistent outages have impeded that) | 01:34 |
lifeless | fungi: 3-4 hours back ? | 01:35 |
lifeless | fungi: I'll delete the template thats building too ? | 01:35 |
*** nosnos has joined #openstack-infra | 01:35 | |
lifeless | fungi: uuid was 56fe3f9e-365e-4609-a70e-3c171aba3fba | 01:36 |
fungi | lifeless: okay, good to know. is it possible that the several-day-old image it's trying to boot new nodes from is broken somehow, in ways that are causing it not to get notification that nova boot is failing? | 01:36 |
lifeless | fungi: I don't think so | 01:36 |
*** thuc has quit IRC | 01:36 | |
fungi | lifeless: i can try to create an updated image and find out what (if anything) breaks | 01:36 |
fungi | at least i'll get some fairly verbose output from the process | 01:37 |
*** thuc has joined #openstack-infra | 01:37 | |
*** zhiwei has joined #openstack-infra | 01:37 | |
*** jcoufal has joined #openstack-infra | 01:37 | |
lifeless | fungi: +1 | 01:37 |
fungi | started new build now | 01:37 |
lifeless | nova list shows a template started | 01:38 |
lifeless | I can ping the template | 01:38 |
fungi | that's good. so far i have no appreciable output, but generally wouldn't until the ssh interaction begins | 01:38 |
lifeless | ok so if nodepool thinks it has 35 nodes | 01:38 |
lifeless | fungi: I can recheck a couple of things | 01:38 |
fungi | nodepool thinks 35 nodes are currently in the process of building | 01:39 |
fungi | some started slightly over an hour ago | 01:39 |
*** jcooley_ has quit IRC | 01:39 | |
fungi | nodepool has logged into and is puppeting the template in progress now | 01:39 |
*** jcooley_ has joined #openstack-infra | 01:40 | |
lifeless | fungi: | 01:40 |
lifeless | check-tripleo-seed-precise NOT_REGISTERED | 01:40 |
lifeless | check-tripleo-undercloud-precise NOT_REGISTERED | 01:40 |
lifeless | check-tripleo-overcloud-precise NOT_REGISTERED | 01:40 |
lifeless | fungi: I guess that means jenkins doesn't think it can run the job at all ? | 01:40 |
fungi | lifeless: that's because none of the jenkins masters have had any tripleo-precise nodes added to them since we restarted zuul | 01:40 |
*** stevebaker has quit IRC | 01:40 | |
*** stevebaker has joined #openstack-infra | 01:40 | |
fungi | normally the jenkins masters register jobs associated with node labels into zuul's gearman server when nodes which can run them are added | 01:41 |
*** thuc has quit IRC | 01:41 | |
zhiwei | fungi: hi | 01:41 |
lifeless | fungi: is there a viibile log of that template build | 01:42 |
zhiwei | I saw there is no rename stackforge project in #infra meeting agenda. | 01:42 |
fungi | zhiwei: it's on the agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting but wasn't discussed during the meeting due to time constraints. outside the meeting we did touch base with SergeyLukjanov who is the savanna ptl and he mentioned that they're waiting on foundation legal feedback before they can settle on the final name for theirs, so probably next week or the | 01:43 |
fungi | week after would be my guess | 01:43 |
*** dprince has joined #openstack-infra | 01:44 | |
*** Ryan_Lane has joined #openstack-infra | 01:44 | |
*** vkozhukalov has joined #openstack-infra | 01:44 | |
fungi | lifeless: i don't think that client-initiated image updates end up in the image logs, though they might. i'm just watching the console spew from it | 01:44 |
zhiwei | ok, thanks. This process blocked too long. | 01:45 |
lifeless | fungi: is it progressing? | 01:45 |
openstackgerrit | A change was merged to openstack-infra/config: Log recheckwatchbot messages https://review.openstack.org/77629 | 01:45 |
fungi | lifeless: it _was_ but has stopped squaking and gone silent. the last few lines were from the start of the install_modules.sh run... http://paste.openstack.org/show/72737/ | 01:46 |
fungi | it's been sitting there for several minutes now | 01:47 |
fungi | i'll ssh into it and see what's happening | 01:47 |
*** SumitNaiksatam has quit IRC | 01:47 | |
*** Ryan_Lane has quit IRC | 01:48 | |
*** dcramer_ has joined #openstack-infra | 01:51 | |
fungi | lifeless: oh, while i was fiddling with getting the ip address and correct ssh key, it updated | 01:51 |
fungi | so it *does* seem to be progressing after all | 01:51 |
*** ryanpetrello has joined #openstack-infra | 01:52 | |
*** gokrokve has joined #openstack-infra | 01:53 | |
lifeless | cool | 01:53 |
lifeless | ok, so I think we need to go | 01:54 |
fungi | it's moved on past puppeting to git repo caching | 01:54 |
lifeless | to get to the group dinner | 01:54 |
lifeless | its clearly a working cloud | 01:54 |
fungi | lifeless: hop to it, and i'll leave you updates in scrollback | 01:54 |
lifeless | hopefully nodepool will sort its stuff out overnight..... | 01:54 |
lifeless | thanks! | 01:54 |
clarkb | fungi: before I forget how do I check for the nova git thing timing out when merging in zuul? | 01:54 |
clarkb | fungi: would like to check to see if any hapapend this afternoon after the restart | 01:54 |
*** rwsu has quit IRC | 01:54 | |
*** sabari has joined #openstack-infra | 01:55 | |
fungi | clarkb: sudo grep "did not appear in the git repo" /var/log/zuul/debug.log | 01:55 |
clarkb | danke | 01:55 |
fungi | or debug.log.2014-03-05 | 01:55 |
fungi | last hit was 17:58:46 | 01:55 |
clarkb | 2014-03-05 17:58:46,913 is the timestamp for the last one | 01:55 |
clarkb | jinx | 01:55 |
fungi | yupsies | 01:55 |
clarkb | it appears to be much happier now | 01:55 |
kevinbenton | clarkb, antaeya: i made a graph that i think shows the probability of a failure. it compares "pipeline gate total changes" with the "gerrit event change merged" by day | 01:56 |
fungi | like clams in... clamato | 01:56 |
kevinbenton | can't seem to export a link to the graph though | 01:56 |
kevinbenton | here is the image | 01:56 |
kevinbenton | http://graphite.openstack.org/render/?width=858&height=502&_salt=1394070660.667&yStep=&from=00%3A00_20140301&until=23%3A59_20140201&xFormat=%25a&yDivisors=&minXStep=10&target=divideSeries(diffSeries(summarize(stats_counts.zuul.pipeline.gate.total_changes%2C%20%221day%22)%2C%20summarize(stats_counts.gerrit.event.change-merged%2C%20%221day%22))%2C%20summarize(stats_counts.zuul.pipeline.gate.total_changes%2C%20%22 | 01:56 |
kevinbenton | 2)) | 01:56 |
kevinbenton | here is the data string | 01:57 |
kevinbenton | divideSeries(diffSeries(summarize(stats_counts.zuul.pipeline.gate.total_changes, "1day"), summarize(stats_counts.gerrit.event.change-merged, "1day")), summarize(stats_counts.zuul.pipeline.gate.total_changes, "1day")) | 01:57 |
reed | jeblair, re: ops, add fifieldt too | 01:57 |
anteaya | kevinbenton: can I get a shortened link? | 01:57 |
fungi | reed: jeblair: good point. he's even in apac! | 01:58 |
anteaya | kevinbenton: weechat can't handle links that linewrap | 01:58 |
*** harlowja has quit IRC | 01:58 | |
kevinbenton | anteaya: http://bit.ly/1jT5OEU | 01:58 |
anteaya | thanks | 01:58 |
* anteaya clicks | 01:58 | |
fungi | "ZeroDivisionError: integer division or modulo by zero" | 01:59 |
fungi | maybe i pasted the link parts together incorrectly | 01:59 |
kevinbenton | the bit.ly one should work | 01:59 |
anteaya | kevinbenton: what is the vertical axis? | 01:59 |
fungi | yeah, i must have. the link shortener one works for me | 01:59 |
kevinbenton | ratio | 01:59 |
kevinbenton | anteya: ratio of failures to successes | 01:59 |
anteaya | so a high ratio is good? | 02:00 |
fungi | high ratio bad | 02:00 |
kevinbenton | anteaya: low ratio is good | 02:00 |
anteaya | k | 02:00 |
fungi | kevinbenton: good to see that we're trending downward! | 02:00 |
kevinbenton | anteaya: sorry i keep messing up your name. my fingers don't like all of the vowels :-) | 02:00 |
anteaya | it seems to have data for thursday and friday | 02:01 |
anteaya | I thought it was wednesday today | 02:01 |
anteaya | kevinbenton: np | 02:01 |
anteaya | I usually do ke tabcomplete for you | 02:01 |
fungi | anteaya: it's been thursday for a couple hours now | 02:01 |
anteaya | or an tab complete for me | 02:01 |
anteaya | and anne gentle and I get each others messages a lot | 02:01 |
anteaya | it is thursday | 02:02 |
anteaya | so it is 2:20 utc | 02:02 |
fungi | 2:02 according to my sun dial | 02:02 |
kevinbenton | anteaya: whoops, i think i didn't have the end date set right | 02:02 |
anteaya | yes me as well | 02:02 |
kevinbenton | http://bit.ly/NVDVkg | 02:02 |
* anteaya clicks again | 02:02 | |
clarkb | kevinbenton: so that will get you an upper bound | 02:03 |
clarkb | kevinbenton: but not an exact number because changes can be removed from the pipeline without merging for reasons other than flaky tests | 02:03 |
clarkb | kevinbenton: if you push a new patchset the one in the gate pipeline is removed, if a reviewer -2's a change it won't merge after testing | 02:03 |
kevinbenton | clarkb: yeah, i tried to find a stat for gate job failure so it was more explicit | 02:03 |
clarkb | kevinbenton: but that should give you a reasonable upper bound | 02:03 |
fungi | clarkb: kevinbenton: in fact, we just today realized that openstack/requirements merges and the requirements proposal job are a major culprit there | 02:04 |
*** mrodden has quit IRC | 02:04 | |
*** harlowja has joined #openstack-infra | 02:04 | |
clarkb | fungi: ya I should fix that when I have a minute | 02:04 |
clarkb | I think I have rewritten that script about 4 times now. I should feel bad | 02:04 |
fungi | at least when the gate is longer, the chances of requirements sync changes being removed from the gate by new patchsets is higher | 02:04 |
kevinbenton | clarkb: that's the other issue, this would probably vary a lot from project to project, right? | 02:05 |
anteaya | kevinbenton: yes depend on the bug dejour | 02:05 |
fungi | kevinbenton: significantly. especially keeping in mind that we host a lot of projects whose jobs are not part of the main integrated gate queue | 02:06 |
anteaya | and olso change broke all of nova for a while yesterday | 02:06 |
anteaya | kevinbenton: until there was a new config.sample merged | 02:06 |
fungi | anteaya: twice in fact ;) | 02:06 |
*** thedodd has joined #openstack-infra | 02:06 | |
anteaya | the first was the oslo change with the dependency | 02:06 |
anteaya | the second was the stale config file | 02:06 |
anteaya | yes? | 02:07 |
kevinbenton | is there a task specifically that i can look in the stats for that marks a job as failed? | 02:07 |
fungi | yep | 02:07 |
anteaya | yay, starting to catch on | 02:07 |
fungi | kevinbenton: there is, but not all jobs which are run in the gate end up being significant since changes can be retested when other changes ahead of them fail | 02:07 |
clarkb | kevinbenton: ya notmyname has a thing together | 02:07 |
clarkb | kevinbenton: https://github.com/notmyname/gate_status he has it hosted somewhere too | 02:08 |
fungi | kevinbenton: further complicated by the fact that when jenkins cancels jobs for a variety of other reasons, those can also often be incorrectly reported as a job failure | 02:08 |
kevinbenton | fungi: well i would want to include that case | 02:08 |
kevinbenton | fungi | 02:08 |
kevinbenton | i'm looking for the things that trigger the downstream jobs to have to get rebased and restarted | 02:09 |
anteaya | http://not.mn/gate_status.html | 02:09 |
fungi | kevinbenton: why? if a change's running jobs are cancelled and retried, that doesn't necessarily imply a separate event. it's usually a symptom of a failure of a change further ahead | 02:09 |
*** jeckersb_gone is now known as jeckersb | 02:10 | |
*** SumitNaiksatam has joined #openstack-infra | 02:10 | |
*** Ryan_Lane has joined #openstack-infra | 02:10 | |
*** jcooley_ has quit IRC | 02:11 | |
* clarkb AFKs | 02:12 | |
kevinbenton | fungi: right. but resets is what i'm looking for because it supports the notion of focusing more resources at the top of the gate to get to success or failure faster | 02:12 |
kevinbenton | fungi: because a reset is wasted compute resources for the downstream jobs | 02:12 |
fungi | kevinbenton: so, we've already made some very recent changes which do just that | 02:13 |
kevinbenton | fungi: cool! details? | 02:13 |
fungi | there's a scaling heuristic which decides how many or how few of the changes at the front of the gate should be tested, and varies by the recent pass/fail frequency of other changes | 02:13 |
kevinbenton | anteaya: thanks, that link was exactly what i was trying to make | 02:14 |
fungi | also, zuul now knows that as soon as a change has at least one failing job, it and its dependent changes (if any) should step aside from the main series and allow changes to shift forward to be tested on top of the other changes which are already succeeding | 02:14 |
kevinbenton | fungi: so what i was getting at earlier was dividing the tempest tests for a single change across more compute nodes | 02:15 |
kevinbenton | fungi: so a job can pass/fail within 30 minutes instead of an hour or whatever | 02:16 |
fungi | part of the challenge is that we have limited compute resources with which to accomplish this, and need to try not to starve check and other pipelines while still prioritizing resources assigned to jobs for changes in the gate pipeline | 02:16 |
*** morganfainberg is now known as morganfainberg_Z | 02:16 | |
fungi | kevinbenton: but yes, we've approached the potential for distributed testing of long-running jobs. i think sdague and mtreinish may have some details on that front | 02:17 |
*** malini is now known as malini_afk | 02:17 | |
fungi | i know several ways of accomplishing that were discussed. i think their analysis concluded that overall throughput would most likely diminish because of setup/teardown overhead eating into the overall capacity | 02:18 |
*** Ryan_Lane has quit IRC | 02:18 | |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard: [WIP] Updated oslo https://review.openstack.org/78496 | 02:19 |
fungi | partly because tempest depends on building a cloud from scratch incorporating the proposed changes, and that part takes a nontrivial amount of time, and would have to be done on each discrete unit where part of the test was being performed | 02:19 |
*** david-lyle has joined #openstack-infra | 02:20 | |
fungi | so the more widely we distribute that load, the more overall capacity we lose to setup overhead | 02:20 |
fungi | individual changes would be tested in less time, but the number of changes we could test in parallel would also decrease | 02:21 |
*** krotscheck has quit IRC | 02:21 | |
fungi | in a nonlinear envelope | 02:21 |
*** chandan_kumar has joined #openstack-infra | 02:21 | |
*** dprince has quit IRC | 02:22 | |
fungi | there might be a sweet spot where that overhead is balanced by the estimated chance a change would suffer from a reset due to a failing change ahead. the task of working out where that moving point is from one moment to the next seems fairly daunting | 02:23 |
fungi | (former math theory major hat on) | 02:24 |
kevinbenton | fungi: that's where the average daily gate failure rate comes into play :-) | 02:24 |
kevinbenton | fungi: as that increases, the more resources shift towards the top | 02:24 |
*** ryanpetrello has quit IRC | 02:24 | |
anteaya | kevinbenton: you are assuming stability where there is not so much | 02:24 |
anteaya | host dns issues | 02:25 |
anteaya | image building issues | 02:25 |
anteaya | mirror issues | 02:25 |
fungi | kevinbenton: i think daily is far too coarse. we get incidents which break all gating for short periods of time while we scramble to solve the underlying cause, and then long periods of relative calm with background noise from nondeterminism in some tests | 02:25 |
anteaya | there is much we do to address the situation | 02:25 |
anteaya | and we have to constantly have flak jackets on for stuff that happens that we never even thought of | 02:25 |
*** jcoufal has quit IRC | 02:25 | |
kevinbenton | anteaya, fungi: ah, i didn't realize how much firefighting was involved :-) | 02:26 |
fungi | rather a lot | 02:26 |
anteaya | much firefighting | 02:26 |
anteaya | two weeks ago was fun | 02:26 |
anteaya | jenkins upgrade downgrade | 02:26 |
*** dstanek has quit IRC | 02:26 | |
anteaya | that took all of a day | 02:26 |
anteaya | and the gate was basically at a stand still | 02:26 |
*** dstanek has joined #openstack-infra | 02:26 | |
anteaya | and fungi was doing the heavily lifting | 02:27 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/storyboard: Make project description longer https://review.openstack.org/78492 | 02:27 |
fungi | basically, as a combined meta-project with more than a hundred outside dependencies, any one of which could spontaneously make a broken release, and relying on network communication over the internet and between cloud hosts which can sometimes be flaky, there a lot of running from one emergency to the next | 02:27 |
anteaya | while the rest of us do what we could to help | 02:27 |
*** nati_uen_ has quit IRC | 02:27 | |
fungi | much of our non-firefighting work goes into finding ways to increase scalability and robustness of these systems to help mitigate the emergencies we can up front, but we can't really predict what the next day will bring with much certainty | 02:28 |
*** bada_ has joined #openstack-infra | 02:28 | |
*** bada has quit IRC | 02:28 | |
kevinbenton | so maybe it needs a nice aggressive factor then, like TCP window reset. :-) | 02:28 |
anteaya | kevinbenton: and optimizing some thing breaks or causes a bottle neck some where else | 02:28 |
anteaya | pretty much every time | 02:29 |
fungi | kevinbenton: that's precisely what the current zuul dependent pipeline windowing was patterned after in fact... tcp slow-start | 02:29 |
anteaya | and we have ddos ourselves more than once with an optimization that did the broken thing very quickly | 02:29 |
*** sabari has quit IRC | 02:30 | |
kevinbenton | fungi: so the only missing part then is piling on some extra instances ;-) | 02:30 |
anteaya | kevinbenton: got any to spare? | 02:30 |
kevinbenton | anteaya: yeah, i can imagine with that many compute nodes ddosing wouldn't be hard | 02:30 |
fungi | kevinbenton: yep. we're in constant talks with our current generous resource donors about quota increases, and always entertaining offers from other interested parties | 02:31 |
anteaya | happens pretty quick | 02:31 |
anteaya | then we have to figure out why, very fast | 02:31 |
anteaya | then how to apply the fix | 02:31 |
*** zhiyan_ is now known as zhiyan | 02:31 | |
kevinbenton | anteaya: unfortunately no extra servers here | 02:31 |
anteaya | bringing down the system is not an option, we have to restart pieces and time them correctly | 02:31 |
fungi | kevinbenton: next time you're bored, tell 1000 of your servers to clone the same 80mb git repo simultaneously ;) | 02:31 |
anteaya | just to see what happens | 02:32 |
kevinbenton | does github block IPs speaking of which? | 02:32 |
fungi | kevinbenton: they do have throttles, but we don't use github (partly for that reason, and also because they're not free software) | 02:32 |
kevinbenton | i noticed our BigSwitch-CI has trouble cloning the repos sometimes | 02:32 |
anteaya | we use cgit | 02:32 |
anteaya | git.openstack.org | 02:33 |
anteaya | your ci should clone from there | 02:33 |
kevinbenton | is that what i should be cloning from? | 02:33 |
anteaya | also known as git.o.o | 02:33 |
fungi | you should clone from wherever you want to, honestly | 02:33 |
*** jnoller has joined #openstack-infra | 02:33 | |
fungi | i'm not going to claim that git.openstack.org lacks any single points of failure | 02:33 |
anteaya | true | 02:33 |
kevinbenton | well will git.openstack.org reset my connections if i'm cloning quite a bit? | 02:33 |
fungi | though i did just grow and upgrade the server darm there this week | 02:34 |
fungi | s/darm/farm/ | 02:34 |
anteaya | you did | 02:34 |
anteaya | kevinbenton: only when we ddos ourselves | 02:34 |
anteaya | so usually, it shouldn't | 02:34 |
anteaya | and if it does, tell us | 02:34 |
kevinbenton | k, and shared fate with the regular jenkins test isn't a bad thing | 02:34 |
anteaya | you might be the canary in the coal mine | 02:34 |
fungi | kevinbenton: we don't throttle it, no, but we recommend you pre-cache within your network when possible. it'll speed things up for you if you can (we do, and then just pull updates since the last nightly cache on our slave images) | 02:34 |
anteaya | kevinbenton: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=2 | 02:35 |
fungi | way faster than doing a full clone of each project on every job | 02:35 |
kevinbenton | fungi: it is based on a cache, but it's getting to be a couple weeks old now | 02:35 |
kevinbenton | fungi: did an initial devstack run to get everything cloned | 02:35 |
kevinbenton | fungi: then just set the RECLONE=True option so it just pulls updates | 02:36 |
*** jnoller has quit IRC | 02:36 | |
fungi | kevinbenton: we clone all the projects once each night for each type of server image in each provider and then pull updates during the day when jobs use servers built from those images. it's been working well so far | 02:36 |
kevinbenton | fungi: yeah, that'll be the next step for me to automate | 02:37 |
*** Alexandra is now known as alex-lunch | 02:37 | |
fungi | kevinbenton: we already have that automated. it's all free software if you want to bend it to your own purposes | 02:37 |
kevinbenton | fungi: which project is that one? | 02:38 |
fungi | kevinbenton: https://git.openstack.org/cgit/openstack-infra/nodepool and the scripts we use with it are http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/ | 02:39 |
fungi | if nothing else, they might serve as good examples for how we solved/are solving these problems for ourselves | 02:39 |
kevinbenton | it shouldn't be a big step for me to automate the caching at this point | 02:40 |
fungi | yep. http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/scripts/cache_git_repos.py | 02:40 |
kevinbenton | thanks | 02:41 |
fungi | anyway, there's some good examples in there you can lift if needed. all apache-licensed | 02:41 |
fungi | and if you have bug fixes, please feel free to put them up for review. we always welcome the help! | 02:42 |
*** sabari has joined #openstack-infra | 02:42 | |
kevinbenton | okay, i'll check these out | 02:43 |
*** sabari has quit IRC | 02:43 | |
kevinbenton | thanks! i'm off for the night | 02:43 |
fungi | you're welcome, of course! me too i think | 02:43 |
* anteaya nods | 02:45 | |
*** rpodolyaka has quit IRC | 02:46 | |
*** thuc has joined #openstack-infra | 02:47 | |
anteaya | we have another submitted, merge pending issue: https://review.openstack.org/#/c/78168/ | 02:49 |
anteaya | another dependency issue | 02:49 |
*** khyati has quit IRC | 02:50 | |
*** bada has joined #openstack-infra | 02:50 | |
*** dkliban has joined #openstack-infra | 02:50 | |
*** thuc has quit IRC | 02:52 | |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Remove inaccurate docs about wildcards https://review.openstack.org/78444 | 02:52 |
*** bada_ has quit IRC | 02:52 | |
fungi | yep, looks like another broken rebase | 02:53 |
jeblair | fungi: oh! we should make sure that case isn't detected by the replication check | 02:54 |
*** sarob_ has joined #openstack-infra | 02:55 | |
fungi | jeblair: i think my patchset to remove the replication check still checks that gerrit has reported the change in a merged state | 02:55 |
*** markmcclain has joined #openstack-infra | 02:55 | |
fungi | unless i'm misreading what that code is meant to accomplish | 02:56 |
*** markmcclain has quit IRC | 02:56 | |
jeblair | fungi: i think you are right, except: if status == 'MERGED' or status == 'SUBMITTED': | 02:57 |
fungi | data = self.gerrit.query(change.number) | 02:57 |
fungi | change._data = data | 02:57 |
fungi | change.is_merged = self._isMerged(change) | 02:57 |
*** sarob_ has quit IRC | 02:57 | |
fungi | ohhh | 02:57 |
jeblair | fungi: so we're actually accepting submitted as merged. that seems wrongish. | 02:57 |
fungi | yep, i missed that inside of _isMerged() | 02:58 |
*** sarob_ has joined #openstack-infra | 02:58 | |
fungi | and i COMPLETELY agree | 02:58 |
fungi | we have multiple examples which prove the two are not at all synonymous | 02:58 |
jeblair | i wonder what i was smoking 1.5 years ago. | 02:58 |
*** sweston has quit IRC | 02:59 | |
fungi | we should fix that, even if we don't remove the replication check, because i'm pretty sure it's wrong even just on principle | 02:59 |
*** thomasem has joined #openstack-infra | 02:59 | |
jeblair | fungi: more than that... guess what the commit msg for the commit that adds that is... | 02:59 |
*** chandan_kumar has quit IRC | 02:59 | |
fungi | i'll submit a separate one-liner for that right now | 02:59 |
jeblair | fungi: "Initial commit." | 02:59 |
fungi | bwahahaha | 02:59 |
fungi | at least we didn't add it later for some worse reason | 03:00 |
jeblair | i'm guessing we were a little more loosy goosy about this gate thing at the time. :) | 03:00 |
fungi | salad days | 03:00 |
*** SumitNaiksatam has quit IRC | 03:00 | |
jeblair | i think that was also before zuul was doing things like not testing changes whose dependencies weren't approved, so i might have actually intended to handle that case like that. | 03:01 |
jeblair | but yeah, pretty wrong now. | 03:01 |
*** stevebaker has quit IRC | 03:02 | |
*** stevebaker has joined #openstack-infra | 03:02 | |
*** sarob_ has quit IRC | 03:02 | |
*** SumitNaiksatam has joined #openstack-infra | 03:03 | |
*** thomasem has quit IRC | 03:04 | |
*** prad has quit IRC | 03:04 | |
*** gokrokve_ has joined #openstack-infra | 03:04 | |
*** stevebaker has quit IRC | 03:05 | |
*** stevebaker has joined #openstack-infra | 03:05 | |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/zuul: Submitted is _not_ necessarily merged in Gerrit https://review.openstack.org/78504 | 03:05 |
fungi | so we don't forget later ^ | 03:05 |
jeblair | thx | 03:05 |
fungi | and with that, off to do eveningish things | 03:06 |
jeblair | g'night | 03:06 |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: fix setting of default values for missing parameters in jenkins.ini file. https://review.openstack.org/78458 | 03:06 |
fungi | oh, also in unrelated news, the debian-security team is strongly considering continuing security support past normal eol for stable... https://lists.debian.org/debian-devel-announce/2014/03/msg00004.html | 03:07 |
fungi | e.g., a "squeeze-lts" suite or similar | 03:08 |
*** gokrokve has quit IRC | 03:08 | |
fungi | one more data point to toss into the blender | 03:08 |
*** Ryan_Lane has joined #openstack-infra | 03:09 | |
*** alex-lunch is now known as Alexandra | 03:09 | |
*** sabari has joined #openstack-infra | 03:15 | |
*** wchrisj has quit IRC | 03:21 | |
*** SumitNaiksatam has quit IRC | 03:22 | |
*** SumitNaiksatam has joined #openstack-infra | 03:23 | |
*** unicell has joined #openstack-infra | 03:27 | |
*** vkozhukalov has quit IRC | 03:27 | |
*** wchrisj has joined #openstack-infra | 03:35 | |
*** jcooley_ has joined #openstack-infra | 03:37 | |
*** stevebaker has quit IRC | 03:38 | |
*** stevebaker has joined #openstack-infra | 03:38 | |
*** stevebaker has quit IRC | 03:42 | |
*** stevebaker has joined #openstack-infra | 03:42 | |
*** Ryan_Lane has quit IRC | 03:42 | |
*** stevebaker has quit IRC | 03:42 | |
*** stevebaker has joined #openstack-infra | 03:43 | |
*** pcrews has left #openstack-infra | 03:43 | |
*** rcleere has joined #openstack-infra | 03:44 | |
*** jcooley_ has quit IRC | 03:53 | |
*** sgordon_ has joined #openstack-infra | 03:55 | |
*** wenlock has joined #openstack-infra | 03:55 | |
*** fifieldt has joined #openstack-infra | 03:55 | |
sgordon_ | so i will probably be like the 500th person to mention this | 03:55 |
sgordon_ | but is *.openstack.org down? | 03:55 |
sgordon_ | nm as i typed that it's back | 03:56 |
fifieldt | www.openstack.org looks up for me | 03:56 |
*** Sam-I-Am has joined #openstack-infra | 03:56 | |
*** Sam-I-Am has left #openstack-infra | 03:56 | |
*** bada has quit IRC | 04:00 | |
*** sarob_ has joined #openstack-infra | 04:03 | |
kevinbenton | what's the appropriate reverify statement for when jenkins died | 04:09 |
kevinbenton | ? | 04:09 |
kevinbenton | http://logs.openstack.org/75/73575/22/gate/gate-neutron-python27/2915949/console.html | 04:10 |
*** julim has joined #openstack-infra | 04:11 | |
*** jcooley_ has joined #openstack-infra | 04:12 | |
*** Ryan_Lane has joined #openstack-infra | 04:14 | |
*** wchrisj has quit IRC | 04:15 | |
*** julim has quit IRC | 04:21 | |
*** stevebaker has quit IRC | 04:27 | |
*** stevebaker has joined #openstack-infra | 04:27 | |
*** Ryan_Lane has quit IRC | 04:30 | |
*** Ryan_Lane1 is now known as Ryan_Lane | 04:30 | |
*** Ryan_Lane has joined #openstack-infra | 04:30 | |
*** jcooley_ has quit IRC | 04:30 | |
*** Ryan_Lane1 has joined #openstack-infra | 04:30 | |
*** jcooley_ has joined #openstack-infra | 04:31 | |
*** jcooley_ has quit IRC | 04:32 | |
*** sarob_ has quit IRC | 04:45 | |
*** sarob_ has joined #openstack-infra | 04:45 | |
*** stevebaker has quit IRC | 04:47 | |
*** stevebaker has joined #openstack-infra | 04:47 | |
*** harlowja is now known as harlowja_away | 04:49 | |
*** sarob_ has quit IRC | 04:50 | |
*** mrodden has joined #openstack-infra | 04:50 | |
*** Ryan_Lane1 has quit IRC | 04:53 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: fix setting of default values for missing parameters in jenkins.ini file. https://review.openstack.org/78458 | 04:53 |
*** CaptTofu has quit IRC | 04:54 | |
*** thuc has joined #openstack-infra | 04:56 | |
*** thuc_ has joined #openstack-infra | 04:56 | |
*** esker has joined #openstack-infra | 04:57 | |
*** thuc has quit IRC | 05:01 | |
*** sarob_ has joined #openstack-infra | 05:02 | |
*** harlowja_away is now known as harlowja | 05:02 | |
*** wchrisj has joined #openstack-infra | 05:07 | |
*** sabari has quit IRC | 05:15 | |
*** sabari has joined #openstack-infra | 05:15 | |
*** sweston has joined #openstack-infra | 05:16 | |
*** nati_ueno has joined #openstack-infra | 05:22 | |
wenlock | kevinbenton, recheck no bug .... ? | 05:23 |
kevinbenton | wenlock: i thought maybe there was a bug for jenkins dying :-) | 05:24 |
*** jcooley_ has joined #openstack-infra | 05:24 | |
*** sarob_ has quit IRC | 05:30 | |
*** sarob_ has joined #openstack-infra | 05:30 | |
*** sarob_ has quit IRC | 05:35 | |
*** jcooley_ has quit IRC | 05:40 | |
*** jcooley_ has joined #openstack-infra | 05:43 | |
*** wchrisj has quit IRC | 05:44 | |
*** nicedice has quit IRC | 05:46 | |
*** jcooley_ has quit IRC | 05:47 | |
*** jcooley_ has joined #openstack-infra | 05:49 | |
*** talluri has joined #openstack-infra | 05:49 | |
*** jcoufal has joined #openstack-infra | 05:51 | |
*** jcooley_ has quit IRC | 05:52 | |
*** gyee has quit IRC | 06:00 | |
*** jhesketh_ has quit IRC | 06:00 | |
*** jcoufal has quit IRC | 06:01 | |
*** jhesketh has quit IRC | 06:03 | |
*** chandan_kumar has joined #openstack-infra | 06:04 | |
*** gokrokve_ has quit IRC | 06:04 | |
*** gokrokve has joined #openstack-infra | 06:04 | |
*** wenlock has quit IRC | 06:07 | |
*** gokrokve has quit IRC | 06:08 | |
*** reed has quit IRC | 06:13 | |
*** gokrokve has joined #openstack-infra | 06:15 | |
*** nati_uen_ has joined #openstack-infra | 06:21 | |
*** thuc has joined #openstack-infra | 06:23 | |
*** nati_ueno has quit IRC | 06:24 | |
*** Alexandra is now known as alex-gone | 06:24 | |
*** thuc_ has quit IRC | 06:26 | |
*** thuc has quit IRC | 06:27 | |
*** mrda is now known as mrda_away | 06:28 | |
*** skraynev_afk is now known as skraynev | 06:30 | |
*** thedodd has quit IRC | 06:34 | |
*** CaptTofu has joined #openstack-infra | 06:34 | |
*** Daisy has joined #openstack-infra | 06:38 | |
*** CaptTofu has quit IRC | 06:39 | |
*** sweston has quit IRC | 06:39 | |
*** amcrn has joined #openstack-infra | 06:40 | |
*** sarob_ has joined #openstack-infra | 06:41 | |
*** sarob_ has quit IRC | 06:45 | |
clarkb | sdague: when you start your morning, instead of checking zuul status can you propose a revert of https://review.openstack.org/#/c/66564/ ? There are reasons that they are isntance level variables, we need to load the logging config in setup_logging ebfore any of those loggers are grabbed | 06:45 |
clarkb | sdague: I think we will need one more change on top of the revert to catch other uses of LOG that were copy pasta'd around | 06:45 |
Daisy | clarkb: could you help to review and push this patch: https://review.openstack.org/#/c/68042/ | 06:49 |
*** Ryan_Lane1 has joined #openstack-infra | 06:50 | |
clarkb | Daisy: I can certainly take a look | 06:50 |
Daisy | I'm eager to see whether it works. It's time for translation team to start message translation. I hope it could run as soon as possible. | 06:51 |
clarkb | Daisy: ok, SergeyLukjanov is usually on in a little bit, I can give it the first +2 and hopefully he can review and approve | 06:52 |
*** pblaho has joined #openstack-infra | 06:53 | |
Daisy | Thank you ! | 06:53 |
*** pblaho has quit IRC | 06:53 | |
*** pblaho has joined #openstack-infra | 06:54 | |
clarkb | I starred the change and will try to remember to look at it in the morning my time | 06:54 |
clarkb | and shepherd it if necessary | 06:54 |
*** vogxn has joined #openstack-infra | 06:54 | |
*** jamielennox is now known as jamielennox|away | 06:57 | |
*** harlowja is now known as harlowja_away | 06:59 | |
*** briancurtin has quit IRC | 06:59 | |
*** denis_makogon has joined #openstack-infra | 07:01 | |
*** jpich has joined #openstack-infra | 07:01 | |
*** jlibosva has joined #openstack-infra | 07:07 | |
openstackgerrit | afazekas proposed a change to openstack-infra/elastic-recheck: Add fingerprint for bug 1288579 https://review.openstack.org/78531 | 07:07 |
*** oubiwann-ef has quit IRC | 07:13 | |
*** ildikov_ has quit IRC | 07:16 | |
SergeyLukjanov | Daisy, clarkb, https://review.openstack.org/#/c/68042/ approved | 07:16 |
Daisy | Thanks ! | 07:16 |
fifieldt | all hail infra | 07:18 |
* fifieldt bows | 07:18 | |
openstackgerrit | A change was merged to openstack-infra/config: Job to push Horizon translation to Transifex https://review.openstack.org/68042 | 07:18 |
jpich | Great :) | 07:19 |
*** harlowja_away has quit IRC | 07:21 | |
*** saju_m has joined #openstack-infra | 07:22 | |
*** Alexey has joined #openstack-infra | 07:23 | |
*** yolanda_ has joined #openstack-infra | 07:27 | |
*** Alexey has quit IRC | 07:28 | |
*** alexey has joined #openstack-infra | 07:28 | |
*** alexey has quit IRC | 07:28 | |
*** achuprin has joined #openstack-infra | 07:29 | |
*** vogxn has quit IRC | 07:31 | |
*** sarob_ has joined #openstack-infra | 07:32 | |
*** thuc has joined #openstack-infra | 07:34 | |
*** adrian_otto has joined #openstack-infra | 07:35 | |
*** sarob_ has quit IRC | 07:36 | |
openstackgerrit | A change was merged to openstack-infra/config: Adds ! defined() guards around a2mod declarations https://review.openstack.org/74443 | 07:37 |
*** thuc has quit IRC | 07:38 | |
*** sweston has joined #openstack-infra | 07:40 | |
achuprin | Hi Infra! | 07:41 |
achuprin | Tell someone who can help me with the creation of Service Account for Third Party Testing? | 07:42 |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Use venv to build documentation https://review.openstack.org/76190 | 07:43 |
*** oubiwan__ has joined #openstack-infra | 07:44 | |
achuprin | This is a link to my email request - https://www.mail-archive.com/openstack-infra@lists.openstack.org/msg00938.html | 07:44 |
clarkb | achuprin: we typically process them in batches and do a handful at a time | 07:44 |
*** sweston has quit IRC | 07:44 | |
clarkb | usually at least once a week. I will put it on my todo list to do those again. Hopefully I can get to that | 07:45 |
achuprin | ok, thanks! | 07:46 |
clarkb | fungi: should https://review.openstack.org/#/c/76280/2 be abandoned? | 07:49 |
*** oubiwan__ has quit IRC | 07:49 | |
openstackgerrit | A change was merged to openstack-infra/config: Restricting chef-cookbook-chefspec job to spec dir https://review.openstack.org/74339 | 07:53 |
*** sabari has quit IRC | 07:56 | |
*** basha has joined #openstack-infra | 07:57 | |
*** gokrokve has quit IRC | 08:00 | |
*** gokrokve has joined #openstack-infra | 08:00 | |
openstackgerrit | Mehdi Abaakouk proposed a change to openstack-infra/devstack-gate: Set CEILOMETER_PIPELINE_INTERVAL to 15 https://review.openstack.org/78537 | 08:01 |
*** gokrokve has quit IRC | 08:04 | |
*** ildikov_ has joined #openstack-infra | 08:05 | |
*** jpich has quit IRC | 08:05 | |
*** saju_m has quit IRC | 08:06 | |
*** e0ne has joined #openstack-infra | 08:10 | |
ttx | jeblair: maybe fifieldt, dunno if he spends enough time on enough channels though | 08:15 |
ttx | also travels a lot, so not "reliably" apac | 08:15 |
rcarrillocruz | hey guys, does openstackgerrit user run with code hosted in https://github.com/openstack-infra/gerritbot ? | 08:17 |
openstackgerrit | A change was merged to openstack-infra/config: Watch also havana branch for packstack https://review.openstack.org/76206 | 08:17 |
clarkb | yes it does | 08:18 |
clarkb | though on an admittedly old commit | 08:18 |
fifieldt | hi | 08:18 |
fifieldt | ttx, mmm? | 08:18 |
*** saju_m has joined #openstack-infra | 08:18 | |
rcarrillocruz | ah...that would explain, cos in the most recent code I don't see strings used by openstack gerrit! | 08:19 |
rcarrillocruz | thx clarkb | 08:19 |
*** dstanek has quit IRC | 08:20 | |
rcarrillocruz | ttx: hi, i'm looking at starting with some infra low-hanging-fruit bugs. I saw https://bugs.launchpad.net/openstack-ci/+bug/1250758 . Do you mean having a single yaml containing all events to be pushed by gerrit review and later gerrit syncs up with Google Calendar? Or on the contrary you maybe mean having a folder for holding calendar events, one yaml per event... | 08:22 |
*** CaptTofu has joined #openstack-infra | 08:22 | |
*** flaper87|afk is now known as flaper87 | 08:22 | |
*** saju_m has quit IRC | 08:24 | |
*** rlandy has joined #openstack-infra | 08:26 | |
*** CaptTofu has quit IRC | 08:27 | |
ttx | rcarrillocruz: we have a group of students on that project now | 08:27 |
ttx | rcarrillocruz: so probably a bad idea to duplicate effort | 08:28 |
ttx | rcarrillocruz: i should have updated the bug, sorry. Doing it now | 08:28 |
*** hashar has joined #openstack-infra | 08:28 | |
*** hashar has quit IRC | 08:28 | |
*** jgallard has joined #openstack-infra | 08:28 | |
clarkb | I need to sleep but one I need to write a bug for is host kibana 3 off of logstash.o.o | 08:30 |
rcarrillocruz | oh | 08:30 |
rcarrillocruz | k | 08:30 |
rcarrillocruz | gsoc or... ? | 08:30 |
clarkb | we are all kibana2 now and need to join the future but I havent had time to put that together in a bug | 08:31 |
rcarrillocruz | any low hanging fruit that you know is not handled by anyone ? | 08:31 |
*** sarob_ has joined #openstack-infra | 08:32 | |
*** dizquierdo has joined #openstack-infra | 08:33 | |
*** openstackgerrit has quit IRC | 08:34 | |
*** openstackgerrit has joined #openstack-infra | 08:34 | |
*** saju_m has joined #openstack-infra | 08:36 | |
*** sarob_ has quit IRC | 08:37 | |
*** basha has quit IRC | 08:39 | |
*** hashar has joined #openstack-infra | 08:40 | |
*** sweston has joined #openstack-infra | 08:41 | |
*** sarob_ has joined #openstack-infra | 08:42 | |
*** gokrokve has joined #openstack-infra | 08:43 | |
*** gokrokve_ has joined #openstack-infra | 08:45 | |
*** sweston has quit IRC | 08:45 | |
*** sarob_ has quit IRC | 08:46 | |
*** basha has joined #openstack-infra | 08:47 | |
*** gokrokve has quit IRC | 08:47 | |
*** denis_makogon has quit IRC | 08:48 | |
*** gokrokve_ has quit IRC | 08:49 | |
*** basha has quit IRC | 08:52 | |
ttx | rcarrillocruz: no it's a group of students at NDSU | 08:53 |
ttx | working with lbragstad | 08:53 |
*** jpich has joined #openstack-infra | 08:54 | |
*** basha has joined #openstack-infra | 08:55 | |
openstackgerrit | A change was merged to openstack-infra/config: Make sure lvm2 tools are installed https://review.openstack.org/76244 | 09:03 |
*** Daisy has quit IRC | 09:04 | |
*** rlandy has quit IRC | 09:04 | |
*** andreaf has joined #openstack-infra | 09:08 | |
*** saju_m has quit IRC | 09:12 | |
*** yassine has joined #openstack-infra | 09:13 | |
*** bada has joined #openstack-infra | 09:14 | |
*** fbo_away is now known as fbo | 09:17 | |
*** mkerrin has quit IRC | 09:18 | |
*** rossella_s has joined #openstack-infra | 09:18 | |
*** mkerrin has joined #openstack-infra | 09:19 | |
*** rlandy has joined #openstack-infra | 09:19 | |
*** hashar has quit IRC | 09:20 | |
*** Ryan_Lane has quit IRC | 09:20 | |
*** Ryan_Lane1 has quit IRC | 09:20 | |
*** zhiwei has quit IRC | 09:22 | |
*** hashar has joined #openstack-infra | 09:22 | |
ttx | sdague: wrote http://fnords.wordpress.com/2014/03/06/why-we-do-feature-freeze/ instead of the blanket email | 09:22 |
ttx | would have made an email too long, also easier to reuse in the future | 09:23 |
*** johnthetubaguy has joined #openstack-infra | 09:24 | |
*** jooools has joined #openstack-infra | 09:24 | |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Content-Type can now be set for email-ext publisher https://review.openstack.org/75919 | 09:26 |
rcarrillocruz | ttx: just got assigned https://bugs.launchpad.net/openstack-ci/+bug/1227985 , hope it's not also taken by students to your knowledge? | 09:30 |
*** sarob_ has joined #openstack-infra | 09:33 | |
ttx | rcarrillocruz: nope, nobody on it to my knowledge ;) | 09:34 |
ttx | rcarrillocruz: and thanks so much for helping ! | 09:34 |
rcarrillocruz | np, thx | 09:35 |
*** sarob_ has quit IRC | 09:37 | |
*** SumitNaiksatam has quit IRC | 09:39 | |
*** ociuhandu has quit IRC | 09:39 | |
*** SumitNaiksatam has joined #openstack-infra | 09:39 | |
*** basha has quit IRC | 09:42 | |
*** gokrokve has joined #openstack-infra | 09:45 | |
*** zhiwei has joined #openstack-infra | 09:45 | |
*** gokrokve has quit IRC | 09:50 | |
*** Ryan_Lane has joined #openstack-infra | 09:51 | |
*** yamahata has quit IRC | 09:56 | |
*** lawcen has joined #openstack-infra | 09:57 | |
*** Ryan_Lane has quit IRC | 09:58 | |
*** lahoucine has joined #openstack-infra | 09:58 | |
*** saju_m has joined #openstack-infra | 09:58 | |
*** lawcen has quit IRC | 09:58 | |
*** jerryz has quit IRC | 09:59 | |
*** jp_at_hp has joined #openstack-infra | 10:00 | |
*** hashar_ has joined #openstack-infra | 10:04 | |
*** hashar has quit IRC | 10:05 | |
*** morganfainberg_Z is now known as morganfainberg | 10:06 | |
*** hashar_ is now known as hashar | 10:08 | |
SergeyLukjanov | ttx, thx for the why FF blogpost, great explanation of the process, I'm glad to use it to explain FF to Savanna contributors | 10:09 |
ttx | SergeyLukjanov: glad you enjoyed it :) | 10:10 |
*** CaptTofu has joined #openstack-infra | 10:11 | |
*** rpodolyaka has joined #openstack-infra | 10:14 | |
*** malini_afk is now known as malini | 10:14 | |
*** CaptTofu has quit IRC | 10:15 | |
*** dmakogon_ is now known as denis_makogon | 10:18 | |
*** enikanorov has quit IRC | 10:18 | |
*** rpodolyaka has quit IRC | 10:18 | |
*** enikanorov has joined #openstack-infra | 10:18 | |
*** yolanda_ has quit IRC | 10:19 | |
*** yolanda_ has joined #openstack-infra | 10:20 | |
*** yolanda_ has quit IRC | 10:25 | |
*** yolanda_ has joined #openstack-infra | 10:27 | |
*** sarob_ has joined #openstack-infra | 10:34 | |
*** sarob_ has quit IRC | 10:38 | |
*** sweston has joined #openstack-infra | 10:41 | |
*** amotoki has joined #openstack-infra | 10:42 | |
*** gokrokve has joined #openstack-infra | 10:45 | |
*** sweston has quit IRC | 10:46 | |
*** saju_m has quit IRC | 10:46 | |
*** yamahata has joined #openstack-infra | 10:47 | |
*** adrian_otto1 has joined #openstack-infra | 10:49 | |
*** adrian_otto has quit IRC | 10:49 | |
*** gokrokve has quit IRC | 10:50 | |
*** saju_m has joined #openstack-infra | 11:03 | |
*** jgallard has quit IRC | 11:13 | |
*** hashar has quit IRC | 11:13 | |
*** ociuhandu has joined #openstack-infra | 11:14 | |
*** rpodolyaka has joined #openstack-infra | 11:15 | |
*** adrian_otto1 has quit IRC | 11:15 | |
*** rossella_s has quit IRC | 11:18 | |
*** rossella_s has joined #openstack-infra | 11:19 | |
*** rpodolyaka has quit IRC | 11:20 | |
*** andre__ has joined #openstack-infra | 11:22 | |
*** lcostantino has joined #openstack-infra | 11:27 | |
*** CaptTofu has joined #openstack-infra | 11:32 | |
*** sarob_ has joined #openstack-infra | 11:35 | |
*** sarob_ has quit IRC | 11:39 | |
sdague | ttx: well, the blanket email with the pointer would be good the FFE runs fierce | 11:40 |
ttx | sdague: I posted the link on the ML | 11:40 |
sdague | it was deep in another thread though, right? | 11:41 |
ttx | yes... I feel like I would be abusing to post it twice though | 11:41 |
ttx | looks like selfpromotion | 11:41 |
*** sweston has joined #openstack-infra | 11:42 | |
ttx | sdague: i'll post a vacation notice later, maybe I can mention it (as part of the "sean will run them" notification) there | 11:43 |
ttx | sdague: I hope most will be covered this week, and you'll only have to check progress at the Tuesday meeting | 11:43 |
ttx | but i try not to be too hopeful :) | 11:44 |
sdague | heh | 11:44 |
ttx | late FFE requests generally come from PTls though, rather than random devs | 11:44 |
sdague | well, I'll just be mean. Unless it's ZOMG nova won't start without this, I think any FFE showing up late needs to wait | 11:44 |
sdague | yeh, the events api in nova is the one that will need to be sorted | 11:45 |
sdague | because that's finally figuring out why neutron + nova races, and a way to stop doing that | 11:45 |
ttx | All tracked at https://etherpad.openstack.org/p/icehouse-FFEs -- still need to sync with markmcclain, jgriffith and markwash | 11:45 |
*** gokrokve has joined #openstack-infra | 11:45 | |
*** sweston has quit IRC | 11:47 | |
ttx | shall have a pretty complete picture by eod | 11:48 |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Only public user fields in unauthorized requests https://review.openstack.org/78586 | 11:48 |
*** gokrokve has quit IRC | 11:49 | |
*** e0ne_ has joined #openstack-infra | 11:51 | |
*** e0ne has quit IRC | 11:51 | |
sdague | sounds good. | 11:53 |
sdague | SergeyLukjanov: do you have enough visibility into nodepool to know why it's stalled? | 11:54 |
sdague | looks like python3 nodes are all gone | 11:55 |
SergeyLukjanov | sdague, I have no access to our infra servers | 11:56 |
sdague | ok | 11:56 |
SergeyLukjanov | sdague, probably, we can find smth in http://nodepool.openstack.org/image.log | 11:56 |
sdague | also, the number of devstack nodes is pretty low | 11:56 |
SergeyLukjanov | sdague, yup, graph doesn't looks healthy | 11:57 |
SergeyLukjanov | and 162 CR in check | 11:58 |
*** sgordon_ has quit IRC | 12:00 | |
sdague | yep | 12:02 |
*** ArxCruz has joined #openstack-infra | 12:02 | |
SergeyLukjanov | sdague, looks like we have no py33 nodes (or lack of them) for at least 5h | 12:11 |
sdague | yeh | 12:11 |
sdague | thus begins the long wait for fungi to get up | 12:11 |
mkoderer | hi folks... does someone know if the recheck of VMware Mine Sweeper work? | 12:14 |
mkoderer | https://review.openstack.org/#/c/73982/ | 12:14 |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Auth Token Middleware https://review.openstack.org/74735 | 12:15 |
*** rpodolyaka has joined #openstack-infra | 12:16 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Only public user fields in unauthorized requests https://review.openstack.org/78586 | 12:16 |
*** weshay has joined #openstack-infra | 12:19 | |
*** rpodolyaka has quit IRC | 12:20 | |
*** bada has quit IRC | 12:21 | |
*** jhesketh has joined #openstack-infra | 12:24 | |
*** jhesketh has quit IRC | 12:25 | |
*** morganfainberg is now known as morganfainberg_Z | 12:27 | |
*** mwagner_lap has quit IRC | 12:27 | |
*** jnoller has joined #openstack-infra | 12:29 | |
*** yassine has quit IRC | 12:29 | |
*** jnoller has quit IRC | 12:37 | |
openstackgerrit | Brad P. Crochet proposed a change to openstack-infra/jenkins-job-builder: Added support for Exclusion plugin https://review.openstack.org/77940 | 12:38 |
*** dstanek has joined #openstack-infra | 12:41 | |
*** sweston has joined #openstack-infra | 12:42 | |
*** sarob_ has joined #openstack-infra | 12:45 | |
*** gokrokve has joined #openstack-infra | 12:45 | |
*** sweston has quit IRC | 12:47 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Remove empty CONF import and useage https://review.openstack.org/78595 | 12:47 |
*** mriedem has joined #openstack-infra | 12:47 | |
*** gokrokve has quit IRC | 12:49 | |
*** sarob_ has quit IRC | 12:50 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Added DELETE method for projects, stories, and tasks. https://review.openstack.org/77763 | 12:50 |
jooools | ls | 13:00 |
*** mfink has quit IRC | 13:00 | |
*** zhiwei has quit IRC | 13:00 | |
*** yamahata has quit IRC | 13:00 | |
*** CaptTofu has quit IRC | 13:09 | |
*** david-lyle has quit IRC | 13:10 | |
*** yamahata has joined #openstack-infra | 13:13 | |
*** esker has quit IRC | 13:13 | |
*** esker has joined #openstack-infra | 13:14 | |
*** esker has quit IRC | 13:14 | |
*** smarcet has joined #openstack-infra | 13:14 | |
*** esker has joined #openstack-infra | 13:14 | |
*** sandywalsh has joined #openstack-infra | 13:15 | |
*** dims has quit IRC | 13:18 | |
*** dims has joined #openstack-infra | 13:19 | |
*** esker has quit IRC | 13:19 | |
anteaya | kevinbenton: in future, https://bugs.launchpad.net/openstack-ci/+bug/1284371 looks like a good candidate | 13:26 |
*** saju_m has quit IRC | 13:27 | |
*** pdmars has joined #openstack-infra | 13:29 | |
*** dcramer_ has quit IRC | 13:33 | |
*** hashar has joined #openstack-infra | 13:33 | |
*** chuck__ has quit IRC | 13:33 | |
*** mfink has joined #openstack-infra | 13:34 | |
*** madmike has joined #openstack-infra | 13:35 | |
*** bknudson has left #openstack-infra | 13:35 | |
*** andre__ has quit IRC | 13:35 | |
openstackgerrit | A change was merged to openstack-infra/config: Support filtering by review id(s) https://review.openstack.org/72446 | 13:35 |
*** sarob_ has joined #openstack-infra | 13:36 | |
*** andre__ has joined #openstack-infra | 13:36 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Added DELETE method for projects, stories, and tasks. https://review.openstack.org/77763 | 13:37 |
*** eharney has joined #openstack-infra | 13:37 | |
*** mbacchi has joined #openstack-infra | 13:38 | |
*** mfink has quit IRC | 13:38 | |
*** sarob_ has quit IRC | 13:40 | |
*** rlandy_ has joined #openstack-infra | 13:41 | |
*** yamahata has quit IRC | 13:41 | |
anteaya | mkoderer: have you gone through the logs that vm minesweeper provides? http://208.91.1.172/logs/neutron/73982/4/4011460/ | 13:41 |
*** e0ne has joined #openstack-infra | 13:42 | |
mkoderer | anteaya: yep and it's not related to my fix.. sdague already told me that I can ignore it | 13:43 |
anteaya | mkoderer: can you email vm minesweeper and let them know that you are ignoring their results and why? | 13:43 |
anteaya | oh guess you can't no email there yet | 13:44 |
*** rlandy has quit IRC | 13:44 | |
anteaya | can you ping salv-orlando about it? | 13:44 |
mkoderer | anteaya: and the recheck doesn't work | 13:44 |
mkoderer | salv-orlando: ping | 13:44 |
*** jgallard has joined #openstack-infra | 13:44 | |
anteaya | and I had thought salv-orlando had offered a vm minesweeper email to be added to the account | 13:45 |
anteaya | mkoderer: thank you | 13:45 |
*** freyes has joined #openstack-infra | 13:45 | |
*** gokrokve has joined #openstack-infra | 13:45 | |
mkoderer | anteaya: ure welcome | 13:45 |
*** e0ne_ has quit IRC | 13:46 | |
sdague | we really need fungi to wake up :) | 13:46 |
fungi | we do? | 13:49 |
*** gokrokve has quit IRC | 13:49 | |
*** thuc has joined #openstack-infra | 13:50 | |
*** thomasem has joined #openstack-infra | 13:51 | |
anteaya | the world can start to turn, fungi's up | 13:51 |
fungi | looks like we're low on slaves... particularly ones we only boot in rax | 13:51 |
*** dkliban has quit IRC | 13:51 | |
anteaya | if I understand correctly, yesterday you did an intial foray into testing to see if the only rax jobs could work on hpcloud | 13:52 |
fungi | yep, looks like almost no slaves in use in rax regions | 13:52 |
fungi | checking the quotas there | 13:52 |
*** malini is now known as malini_afk | 13:52 | |
fungi | anteaya: nope, rather we were testing whether we could use the new hp region instead of the old hp region (which needs bigger flavors, so we were testing a means of limiting the available ram on them) | 13:53 |
fungi | we've got a ton of rackspace nodes in a delete state for a long time, so i'll clear those while checking other things | 13:54 |
sdague | fungi: yeh, basically the whole of check is stalled out | 13:54 |
sdague | or down to a trickle | 13:54 |
sdague | also no single use pypy or python3 | 13:54 |
*** rlandy_ is now known as rlandy | 13:55 | |
fungi | right, we generally don't need as many of those so we on'y keep a few on hand, currently all in rax regions | 13:55 |
openstackgerrit | A change was merged to openstack-infra/devstack-gate: Set CEILOMETER_PIPELINE_INTERVAL to 15 https://review.openstack.org/78537 | 13:56 |
*** ryanpetrello has joined #openstack-infra | 13:56 | |
anteaya | fungi: ah thanks for clarifying | 13:57 |
anteaya | might it be worthwhile to spread out the rax only jobs to hpcloud as well, in future? | 13:57 |
anteaya | or did I just state the obvious again? | 13:58 |
fungi | yeah, that's something we've wanted to do | 13:58 |
openstackgerrit | Antoine Musso proposed a change to openstack-infra/zuul: Document the Zuul triggers https://review.openstack.org/77843 | 13:59 |
fungi | looks like we're at max ram in ord, even though we're only a little over 60% of our instance limit | 14:00 |
*** zns has joined #openstack-infra | 14:01 | |
fungi | and we maxed out the 5000 instances created per day there | 14:01 |
fungi | iad and dfw on the other hand have quite a bit of capacity | 14:02 |
*** rcarrillocruz1 has joined #openstack-infra | 14:03 | |
fungi | i'm going to stop puppet agent on nodepool and manually zero out the quota on ord to calm it down a bit | 14:03 |
*** rcarrillocruz has quit IRC | 14:03 | |
*** hartsocks has joined #openstack-infra | 14:08 | |
fungi | i'll also get started trying to add py3k-precise and bare-centos6 images in hpcloud-az1 and az3 (az2 is dead to me now) | 14:08 |
*** hartsocks has left #openstack-infra | 14:08 | |
*** amotoki has quit IRC | 14:09 | |
*** thuc has quit IRC | 14:10 | |
*** yamahata has joined #openstack-infra | 14:10 | |
*** thuc has joined #openstack-infra | 14:12 | |
*** bknudson has joined #openstack-infra | 14:12 | |
*** changbl has quit IRC | 14:14 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: create an integrated-gate template https://review.openstack.org/78612 | 14:15 |
sdague | fungi: can we get rax to change that instance create param, because that's now hit us multiple times | 14:16 |
fungi | sdague: jeblair has an e-mail thread going with them to sort it out since last week some time. i gather there's progress | 14:16 |
openstackgerrit | Flavio Percoco proposed a change to openstack-infra/devstack-gate: Archive config files along with logs https://review.openstack.org/69344 | 14:16 |
*** yamahata has quit IRC | 14:17 | |
fungi | we're painfully aware of the need to get it resolved | 14:17 |
*** yamahata has joined #openstack-infra | 14:17 | |
*** rpodolyaka has joined #openstack-infra | 14:17 | |
*** dkranz has joined #openstack-infra | 14:18 | |
*** thomasem has quit IRC | 14:18 | |
*** yamahata has quit IRC | 14:19 | |
*** yamahata has joined #openstack-infra | 14:19 | |
*** thomasem has joined #openstack-infra | 14:20 | |
mestery | Question for any infra folks: I'm working with the Linux Foundation to enable 3rd party testing for the OpenDaylight Neutron integration. | 14:20 |
*** julim has joined #openstack-infra | 14:20 | |
mestery | When our Jenkins is voting back, we don't see this in the logs of reviews we're running tests against. | 14:20 |
mestery | IS this expected at first? | 14:20 |
mestery | I don't even see our results in the reviews either, which is concerning. | 14:20 |
mestery | Logs on the Linux Foundation JEnkins server indicate it is voting back. | 14:20 |
*** rpodolyaka has quit IRC | 14:21 | |
anteaya | mestery: your account can't vote right now | 14:21 |
mestery | anteaya: OK, I figured that, thanks for confirming anteaya! | 14:22 |
anteaya | OpenDaylight Jenkins is in the non-voting group | 14:22 |
mestery | OK | 14:22 |
mestery | But, I don't see results in there either. | 14:22 |
mestery | WE removed the "starting" post which was going into reviews per suggestion from markmcclain | 14:22 |
anteaya | https://review.openstack.org/#/admin/groups/270,members | 14:22 |
anteaya | that is great | 14:22 |
mestery | Cool :) | 14:23 |
anteaya | this group is the non-voting group and includes the voting group as a subset | 14:23 |
mestery | Mark indicated that wasn't needed and in fact was more annoying than anything | 14:23 |
anteaya | so | 14:23 |
mestery | OK, cool, thanks! | 14:23 |
anteaya | right now new 3rd party ci accounts start out in the non-voting group | 14:23 |
anteaya | to ensure their systems are stable | 14:23 |
mestery | Got it, that makes perfect sense. | 14:23 |
anteaya | they can comment on patches and vote on teh sandbox repo: git.openstack.org/openstack-dev/sandbox | 14:24 |
anteaya | then once they are stable and have some history of being a reiable service and a good community member | 14:24 |
fungi | mestery: if you attempt to post a vote (a "vrif" score) when leaving a comment and the acl doesn't allow it, the api call will fail completely and you'll get no comment added at all. you can however configure it to leave a 0 score when commenting and i believe that should work | 14:24 |
mestery | fungi: Bingo, that's the problem I think! | 14:24 |
mestery | Linux Foundation told me they were voting +1. | 14:25 |
mestery | I'll have htem change it to 0 for now. | 14:25 |
anteaya | they apply at their projects weekly meeting and then if the ptl agrees, the ptl talks to gerrit admin and you get into the voting group | 14:25 |
fungi | mestery: you can vote +1 on openstack-dev/sandbox right now to test out that the vrif score addition is working, just not on any other projects | 14:25 |
mestery | anteaya: Thank you for clarifying the process, much appreciated! | 14:25 |
mestery | fungi: How do I vote on that sandbox? Just change the repository when voting back? | 14:25 |
anteaya | mestery: np, right now people are being advised to ignore the 3rd party output since the group as a whole is not stable | 14:26 |
anteaya | see the backscroll in -qa | 14:26 |
fungi | mestery: once we move that account from the "third-party ci" group to the "voting third-party ci" group it will be able to vote on any project | 14:26 |
mestery | anteaya: OK, thanks! | 14:26 |
anteaya | this isn't an exercise in creating noise, the point is to create useful information people pay attention to, but right now it is just viewed as noise | 14:27 |
mestery | fungi: Thank you for the information. | 14:27 |
*** thuc has quit IRC | 14:27 | |
mestery | anteaya: 100% agree! My goal with the OpenDaylight Jenkins is to have it run jobs against both OpenStack and OpenDAylight code bases. :) | 14:27 |
anteaya | great | 14:27 |
mestery | So, less noise, more testing from both ends. | 14:27 |
anteaya | right | 14:27 |
mestery | And I got things working last night! | 14:27 |
mestery | Linux Foundation has some issues with their OpenStack cloud we're working through as well at the moment. | 14:27 |
mestery | So good we're non-voting to start with :) | 14:28 |
*** thuc has joined #openstack-infra | 14:28 | |
anteaya | so the more reliable all systems are the better for each individual 3rd party testing system | 14:28 |
mestery | Agreed anteaya. | 14:28 |
anteaya | great | 14:28 |
anteaya | let us know how else we can help | 14:28 |
anteaya | and also you might enjoy attending jaypipes 3rd party testing workshops in -meeting on mondays at 18:00 utc, I think that is the correct time | 14:29 |
mestery | I have that on my calendar, unfourtanetly I am on vacation next week, getting away from the Minnesota winters for a week with the family. :) | 14:29 |
*** yamahata has quit IRC | 14:29 | |
anteaya | good for you | 14:29 |
anteaya | you, me, ttx | 14:29 |
anteaya | next week is a popular choice to disappear | 14:29 |
mestery | Ah, you're gone as well? Where too? | 14:30 |
*** yamahata has joined #openstack-infra | 14:30 | |
anteaya | thailand | 14:30 |
mestery | I'm headed to San Diego | 14:30 |
mestery | Wow, have fun, sounds like a great trip! | 14:30 |
anteaya | great, been there before? | 14:30 |
mestery | Yes, though not with the family. Kids are excited to spend time on the beach. :) | 14:30 |
anteaya | thanks, I am looking forward to it, my host keeps calling me to make sure I have the train schedule | 14:30 |
anteaya | mestery: nice, I hope you have pleasant travels and lots of beach time | 14:31 |
mestery | Same to you anteaya! | 14:31 |
anteaya | thanks | 14:31 |
*** briancurtin has joined #openstack-infra | 14:31 | |
*** thuc has quit IRC | 14:31 | |
*** HenryG has quit IRC | 14:34 | |
*** aysyanne has joined #openstack-infra | 14:35 | |
*** fifieldt has quit IRC | 14:35 | |
*** wchrisj has joined #openstack-infra | 14:35 | |
*** dkranz has quit IRC | 14:36 | |
*** sarob_ has joined #openstack-infra | 14:36 | |
*** e0ne has quit IRC | 14:37 | |
*** e0ne has joined #openstack-infra | 14:37 | |
*** mgagne has quit IRC | 14:39 | |
*** afazekas has joined #openstack-infra | 14:39 | |
*** sarob_ has quit IRC | 14:41 | |
*** yamahata has quit IRC | 14:41 | |
*** rcarrillocruz has joined #openstack-infra | 14:41 | |
sdague | mestery: I'm excited about the OpenDaylight testing going on an LF in OpenStack style | 14:42 |
sdague | very cool | 14:42 |
mestery | sdague: Yes, me too! | 14:43 |
*** rcarrillocruz1 has quit IRC | 14:43 | |
mestery | sdague: We plan to move them to the full OpenStack setup with zuul, jjb, etc. very soon. | 14:43 |
*** dkranz has joined #openstack-infra | 14:43 | |
mestery | They are excited about it as well! | 14:43 |
*** dcramer_ has joined #openstack-infra | 14:44 | |
*** gokrokve has joined #openstack-infra | 14:45 | |
sdague | it would be nice to have a solid open source SDN. once that really hardens up, I'd like to see that as our neutron default cause | 14:48 |
sdague | case | 14:48 |
sdague | because the raw ovs approach... continues to be problematic, as we've seen in the gate. | 14:48 |
mestery | sdague: Agreed! I think it will get there in the Juno timeframe, which aligns with the next release of OpenDaylight (Helium) | 14:48 |
sdague | great | 14:48 |
mestery | The patches I have out now (devstack and Neutron) for ODL lay the groundwork. | 14:49 |
*** talluri has quit IRC | 14:49 | |
*** rlandy has quit IRC | 14:49 | |
*** zul has quit IRC | 14:49 | |
*** gokrokve has quit IRC | 14:49 | |
*** jnoller has joined #openstack-infra | 14:50 | |
*** talluri has joined #openstack-infra | 14:50 | |
*** jswarren has joined #openstack-infra | 14:50 | |
*** zul has joined #openstack-infra | 14:52 | |
*** dkranz has quit IRC | 14:52 | |
*** mfer has joined #openstack-infra | 14:53 | |
*** talluri has quit IRC | 14:54 | |
*** nosnos has quit IRC | 14:55 | |
*** dkranz has joined #openstack-infra | 14:57 | |
sdague | mestery: ok, some quick feedback on https://review.openstack.org/#/c/69774/ | 14:57 |
sdague | couple of questions on it, so let me know if there are answers, then I'm +2 | 14:57 |
mestery | sdague: Checking it out, thanks for the review! | 14:57 |
*** HenryG has joined #openstack-infra | 14:57 | |
mestery | sdague: I agree on the SERVICE_HOST comment, will default it to that. | 14:58 |
*** dkliban has joined #openstack-infra | 14:58 | |
mestery | sdague: what functions in devstack for adding config? Pointer? | 14:58 |
anteaya | also noticing this question got missed: How do I vote on that sandbox? Just change the repository when voting back? No, you need to set up your system to listen the stream from the sandbox repo and then you need to submit a patchset to the sandbox repo to trigger a test run | 14:59 |
*** malini_afk is now known as malini | 14:59 | |
*** jnoller has quit IRC | 14:59 | |
sdague | mestery: https://github.com/openstack-dev/devstack/blob/master/functions-common#L40 | 14:59 |
mestery | anteaya: Got it, thanks! Is that required before moving to full voting? | 14:59 |
mestery | sdague: Cool! Thanks for the pointer! I'll rework a new patch with your comments addressed ASAP. | 15:00 |
sdague | I'm not 100% sure if it will work in you case, but if it will, that would be great | 15:00 |
anteaya | mestery: up to markmcclain and the rest of the project, but we are suggesting it and it is a good demonstration of how your system handles voting | 15:00 |
*** eharney has quit IRC | 15:00 | |
mestery | sdague: I'll try it out! | 15:00 |
mestery | anteaya: Thanks again for all the help! | 15:01 |
anteaya | np | 15:01 |
dstufft | mordred: sdague lifeless fungi clarkb whoever else, you may see some breakage in installs | 15:01 |
anteaya | dstufft: thanks | 15:01 |
dstufft | if you accidently upgraded setuptool to 3.0+ | 15:01 |
fungi | dstufft: thanks for the heads up! | 15:01 |
sdague | dstufft: is this something we should block on our side? | 15:01 |
sdague | or is a fix imminent | 15:02 |
*** gokrokve has joined #openstack-infra | 15:02 | |
dstufft | setuptools 1.0 deprecated the "Feature" feature and setuptools 3 removed it, some projects were using it in their setup.py and those projects will fail to install if you have setuptools 3.0+ installed | 15:02 |
dstufft | (there was some issue that the deprecation warning wasn't very visible since it just used a standard DeprecationWarning from Python which are silent by default :/) | 15:03 |
*** jnoller has joined #openstack-infra | 15:03 | |
dstufft | but it won't be fixed upstream because it was a planned deprecatiion/removal | 15:03 |
fungi | i don't think we used feature in our setup.py files, but i suppose some of our dependencies may | 15:03 |
*** mwagner_lap has joined #openstack-infra | 15:04 | |
dstufft | fungi: I know cffi did | 15:04 |
dstufft | not sure if y'all depend on that or not, I think you do | 15:04 |
* fungi grumbles | 15:04 | |
fungi | in some places, i think | 15:04 |
dstufft | there's a fix for cffi getting pushed out now | 15:04 |
lifeless | dstufft: sadface | 15:04 |
dstufft | well getting patched | 15:04 |
dstufft | not sure whn they'll do a release | 15:04 |
dstufft | I know of zope.interface and Markupsafe too | 15:04 |
sdague | yeh, pycrypto needs that | 15:04 |
sdague | I think that's the only place we hit it though | 15:05 |
*** rlandy has joined #openstack-infra | 15:05 | |
sdague | fungi: I guess an early mirror trigger might be in order once cffi is out ? | 15:05 |
fungi | sdague: possibly, if we end up using setuptools 3.0 inadvertently | 15:06 |
sdague | lifeless: at least you know the answer to your question in -dev | 15:06 |
dstufft | oh | 15:06 |
dstufft | didn't even notice that | 15:06 |
dstufft | welp glad I said something then :) | 15:06 |
sdague | fungi: https://review.openstack.org/#/c/78262/ - clarkb has +2ed it | 15:06 |
sdague | and he was the last one to touch that code | 15:06 |
fungi | yeah, i was going to follow suit and then he said something in irc after that which sounded like maybe he was recanting. let me refresh my memory | 15:08 |
*** zns has quit IRC | 15:08 | |
sdague | SergeyLukjanov: so on https://review.openstack.org/#/c/78612/ - I'd like to start smaller | 15:08 |
*** mgagne has joined #openstack-infra | 15:08 | |
sdague | and bring over the other jobs one at a time because I do think we need to revisit if they all actually need to be cogating | 15:09 |
SergeyLukjanov | sdague, ok, sounds reasonable | 15:09 |
*** Hefeweizen has quit IRC | 15:10 | |
sdague | and grenade, tempest full, and tempest neutron seems reasonably uncontroversial | 15:10 |
sdague | it did pick up a few places, like trove-client, that weren't in the mix on these | 15:10 |
*** denis_makogon has quit IRC | 15:10 | |
fungi | we now have a bare-centos6 image in hpcloud-az3 and nodes being launched from it | 15:11 |
mtreinish | sdague: no love for postgres :) | 15:11 |
sdague | mtreinish: lets start small, and sort out the rest as we go :) | 15:12 |
fungi | other bare-centos6 and py3k-precise images in az1 and az3 are near completion as well | 15:12 |
sdague | honestly, with current clean check, I think postgres could safely live on check only. | 15:12 |
openstackgerrit | A change was merged to openstack-infra/config: remove inline set -e that is preventing explanations https://review.openstack.org/78262 | 15:12 |
*** jaypipes has joined #openstack-infra | 15:13 | |
*** rlandy_ has joined #openstack-infra | 15:14 | |
*** freyes has quit IRC | 15:14 | |
openstackgerrit | Thierry Carrez proposed a change to openstack-infra/storyboard: Remove Branch and Milestone legacy tables https://review.openstack.org/77187 | 15:14 |
*** rlandy has quit IRC | 15:15 | |
*** rlandy_ is now known as rlandy | 15:15 | |
*** adrian_otto has joined #openstack-infra | 15:15 | |
ttx | fungi: another "submitted" thing: https://review.openstack.org/#/c/78168/ | 15:16 |
ttx | "Depends on commit 35b513c1b3a0770db00dbf4aed754d9d6d9614e5 which has no change associated with it" | 15:16 |
fungi | ttx: yeah, spotted that one last night | 15:16 |
fungi | someone screwed up a rebase, looked like | 15:17 |
ttx | fungi: looks pretty recent | 15:17 |
fungi | ttx: it stemmed from your approval about 25 hours ago | 15:18 |
fungi | ttx: i saw it last night | 15:18 |
ttx | Yeah, looks funny @ https://review.openstack.org/#/q/status:open+project:openstack/openstack-planet,n,z | 15:18 |
*** sarob_ has joined #openstack-infra | 15:20 | |
sdague | fungi: how you feeling about this - https://review.openstack.org/#/c/78612/ | 15:20 |
sdague | because I'd like to turn the heat jobs voting on top of that, instead of adding 30 lines to layout.yaml :) | 15:21 |
anteaya | fungi: so did we ever find out why all the rax nodes disappeared? was it quota? | 15:21 |
fungi | ttx: right, looks like 78168 was committed on top of 78161 after that commit was modified, but then only 78168 got pushed to gerrit without the modified commit for 78161 | 15:21 |
ttx | ew | 15:21 |
fungi | ttx: so since the parent commit didn't exist in gerrit, it didn't set up a dependency relationship between the commits, but then when it tried to merge gerrit realized there was a missing dependency there and refused | 15:22 |
jeblair | good morning | 15:22 |
fungi | morning jeblair | 15:23 |
*** amotoki has joined #openstack-infra | 15:23 | |
anteaya | morning jeblair | 15:24 |
fungi | sdague: i'll have a look in a bit. still trying to unwind whether there's anything else wrong in nodepool land | 15:24 |
sdague | fungi: cool, thanks | 15:24 |
*** sarob_ has quit IRC | 15:25 | |
jeblair | fungi, clarkb: i don't think it would be enough for zuul to remember the most recent change merged because what if 5 merge in a row (and that would be even faster if we remove the replication check) | 15:25 |
fungi | anteaya: they didn't all disappear. we were getting starved out of most of the less common (in this case py3k-precise and to a lesser extent bare-precise and bare-centos6) nodes because nodepoold was trying too hard to bring them up in rax-ord | 15:25 |
*** rpodolyaka has joined #openstack-infra | 15:26 | |
jeblair | fungi, clarkb: so something like remembering which project-branches were seen in the most recent X time is more correct; or having the merger create refs for all projects in the shared queue is probably most correct (but potentially slow) | 15:26 |
fungi | jeblair: right, almost certainly not just the most recent. more like some time window | 15:26 |
mestery | sdague: https://review.openstack.org/#/c/69774/ Addressed your main concern and few of the smaller ones. | 15:26 |
fungi | oh, you just said that | 15:26 |
mestery | sdague: Config file thing was tricky, see my comments on patchset 14 for more details. | 15:26 |
jeblair | fungi, clarkb: there's that kind of approach, or the other alternative is to try to measure replication completion more correctly. i don't have good ideas about that other than to tell zuul about all replication targets and have it check all of them (but what if one is intentionally down?). i'm not as keen on this. | 15:27 |
sdague | yeh, I wasn't sure if it would work or not | 15:27 |
sdague | +2 | 15:27 |
jeblair | fungi, clarkb: or teach zuul to read gerrit's process list. that seems really wrong. | 15:28 |
jeblair | mestery, sdague: you were talking about making odl the default neutron case... is odl testing something that can be done upstream? | 15:28 |
sdague | jeblair: yes, it could be | 15:29 |
mestery | jeblair: Eventually I'd like to get ODL as the default Neutron driver. The patch above moves us closer. | 15:29 |
*** rpodolyaka has quit IRC | 15:29 | |
sdague | not today | 15:29 |
sdague | but it could get there | 15:29 |
mestery | jeblair: My testing has shown ODL with Neutron is vastly more responsive than with the OVS agents. | 15:30 |
sdague | once opendaylight is a bit more tested | 15:30 |
mestery | +1 to what sdague is saying | 15:30 |
mestery | We're going to be testing OpenStack with each OpenDaylight commit in the OVSDB project soon as well,. | 15:30 |
mestery | So it will get a lot of testing, both openstack and opendaylight | 15:30 |
sdague | I think the stretch goal of doing that by end of Juno is a good one | 15:30 |
mestery | agreed | 15:30 |
sdague | assuming the neutron team agrees as well with that as their default | 15:31 |
mestery | agreed sdague, markmcclain is aware of this, but will take discussion in Atlanta I think | 15:31 |
*** apevec has joined #openstack-infra | 15:31 | |
sdague | jeblair: so in summary, it's completely technically doable in upstream | 15:32 |
anteaya | fungi: ah | 15:32 |
sdague | and it's a policy decision that will need agreement | 15:32 |
jeblair | awesome! | 15:32 |
* mestery loves it when a plan comes together sdauge. | 15:32 | |
apevec | russellb, vishy - https://review.openstack.org/76250 backport should make Nova Grizzly happen, please review | 15:32 |
dhellmann | good morning | 15:33 |
russellb | +A | 15:33 |
anteaya | morning dhellmann | 15:33 |
dhellmann | anteaya: I have some cold weather to ship back to Canada. | 15:34 |
*** david-lyle has joined #openstack-infra | 15:35 | |
openstackgerrit | A change was merged to openstack-infra/storyboard: Handle yaml files updates https://review.openstack.org/78491 | 15:35 |
anteaya | dhellmann: bring it | 15:36 |
dhellmann | anteaya: haha, I'll go see about postage | 15:36 |
*** markmcclain has joined #openstack-infra | 15:36 | |
anteaya | ah that will be a problem | 15:37 |
apevec | russellb, ...happy even, thanks! | 15:37 |
openstackgerrit | A change was merged to openstack-infra/storyboard: Auth Token Middleware https://review.openstack.org/74735 | 15:37 |
anteaya | dhellmann: our postal system is &^%^%^*&ed... ah less efficient than it could be | 15:37 |
openstackgerrit | A change was merged to openstack-infra/storyboard: Remove empty CONF import and useage https://review.openstack.org/78595 | 15:38 |
dhellmann | anteaya: UPS then? | 15:38 |
anteaya | dhellmann: fedex | 15:38 |
anteaya | UPS just keeps your stuff in a warehouse for ever | 15:38 |
anteaya | sending alerts and never able to find/deliver it | 15:38 |
dhellmann | I'll see if I can squeeze it into one of those little envelopes | 15:38 |
*** beagles has left #openstack-infra | 15:38 | |
*** jnoller has quit IRC | 15:38 | |
anteaya | dhellmann: go you, loves me some cold weather | 15:39 |
anteaya | which reminds me, I need to break out the parka, been so long I forgot to use it | 15:39 |
dhellmann | normally I do too, but I've had enough this year | 15:39 |
* anteaya nods | 15:39 | |
*** jnoller has joined #openstack-infra | 15:39 | |
anteaya | now having said all that, I am running away to thailand for a week | 15:39 |
dhellmann | haha | 15:39 |
*** thedodd has joined #openstack-infra | 15:39 | |
anteaya | but the package will be delayed anyway | 15:39 |
anteaya | so I will pick it up once I return | 15:40 |
*** juice has quit IRC | 15:40 | |
*** rpodolyaka has joined #openstack-infra | 15:40 | |
*** rpodolyaka1 has joined #openstack-infra | 15:40 | |
*** jnoller has quit IRC | 15:41 | |
*** oubiwan__ has joined #openstack-infra | 15:41 | |
*** krotscheck has joined #openstack-infra | 15:42 | |
*** wenlock has joined #openstack-infra | 15:43 | |
*** sweston has joined #openstack-infra | 15:43 | |
openstackgerrit | Matthew Treinish proposed a change to openstack-infra/devstack-gate: Add support for running with a custom regex filter https://review.openstack.org/77664 | 15:44 |
*** sarob_ has joined #openstack-infra | 15:45 | |
*** rcleere has quit IRC | 15:46 | |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Add bare-centos6 and py3k-precise nodes to hpcloud https://review.openstack.org/78639 | 15:48 |
*** sweston has quit IRC | 15:48 | |
fungi | okay, we've had jobs run and succeed on py3k-precise and bare-centos6 nodes in hpcloud-az1 and hpcloud-az3 so there's ^ a change for that | 15:48 |
anteaya | yay | 15:48 |
*** esker has joined #openstack-infra | 15:48 | |
fungi | https://jenkins06.openstack.org/job/gate-heat-python26/364/console https://jenkins06.openstack.org/job/gate-horizon-python26/190/console https://jenkins04.openstack.org/job/gate-python-marconiclient-pypy/3/console https://jenkins04.openstack.org/job/gate-python-marconiclient-python33/3/console | 15:48 |
anteaya | should we do just az-1 and az-3 for now? will the az-2 issues become a problem for these jobs, more so that other jobs? | 15:50 |
*** sarob_ has quit IRC | 15:50 | |
fungi | no, jobs for these won't run in az2 anyway until the images are able to build successfully there so we can boot nodes from them | 15:51 |
fungi | and the az2 problem might be cleared up by the time that's reviewed and merged anyway | 15:51 |
fungi | clarkb said someone in hp was going to have a look at the ticket this morning and try to dig into it | 15:52 |
*** pcrews has joined #openstack-infra | 15:55 | |
sdague | jeblair: I'd like to start rolling up the integrated-gate so we could turn on heat-slow as gating easier - https://review.openstack.org/#/c/78612/ (I think a reasonably minimal starting point) | 15:56 |
lifeless | fungi: so, cloud is still up, but no nodepool love | 15:56 |
anteaya | fungi: k, thanks | 15:56 |
krotscheck | How would I go about getting my ssh key on storyboard.openstack.org so I can go look at logs? | 15:56 |
krotscheck | Server behavior does not seem to match codebase behavior right now. | 15:57 |
fungi | sdague: is adding grenade to heat going to be problematic? | 15:57 |
sdague | fungi: it better not be :) | 15:57 |
sdague | honestly, right now it probably noops | 15:57 |
*** sarob_ has joined #openstack-infra | 15:57 | |
sdague | however having an upgrade job is part of the TC approved requirements for integrated projects | 15:58 |
*** rpodolyaka1 has quit IRC | 15:58 | |
fungi | sdague: trove as well i guess | 16:01 |
sdague | yep | 16:02 |
fungi | and ceiloclient | 16:02 |
jeblair | krotscheck: is there something we can look up quickly for you? | 16:02 |
mordred | krotscheck: http://paste.openstack.org/show/72784/ | 16:03 |
krotscheck | mordred: Thanks. | 16:04 |
*** dstufft has quit IRC | 16:04 | |
*** rcarrillocruz1 has joined #openstack-infra | 16:04 | |
*** dstufft has joined #openstack-infra | 16:05 | |
lahoucine | Hi everyone, I'have deleted my old account "lahoucine <ben.lahoucine@gmail.com>" but it's still visible in gerrit. My current account "Lahoucine BENLAHMR <lahoucine@benlahmr.com>" is shown duplicated and is unselectable when trying to add it as reviewer to a change.Any one knows how I can definitly remove my old account "lahoucine <ben.lahoucine@gmail.com>", and how to makes my current account "Lahoucine BENLAHMR <lahoucine@benla | 16:05 |
lahoucine | hmr.com>" works (resolve deplucation) ? Thank you for your help! | 16:05 |
*** pblaho has quit IRC | 16:05 | |
anteaya | lahoucine: hi | 16:05 |
apevec | fungi, speaking of Trove - I've proposed https://review.openstack.org/77982 but looks like check job isn't using review branch so it fails | 16:05 |
anteaya | welcome | 16:06 |
jeblair | sdague: why do you want to reduce the integration tests in the gate? | 16:06 |
lahoucine | hi anteaya | 16:06 |
anteaya | lahoucine: fungi is our gerrit db account clean up person | 16:06 |
lahoucine | thanks | 16:06 |
fungi | lahoucine: i saw your bug report from earlier (and your private /msg which i hadn't gotten to yet). i'll take a look in a bit | 16:06 |
*** rcarrillocruz has quit IRC | 16:06 | |
sdague | jeblair: so this patch doesn't reduce that | 16:06 |
fungi | lifeless: seeing if i can tell why they're not booting now, but we did get successful completion of that image build last night | 16:06 |
sdague | it extracts out common, uncontroversial tests | 16:07 |
lahoucine | hi fungi, ok thanks | 16:07 |
jeblair | sdague: i know, but the reason to ditch SergeyLukjanov's work in favor of yours hinges on this | 16:07 |
*** juice has joined #openstack-infra | 16:07 | |
jeblair | sdague: so it looks like you don't think neutron-full, large-ops, neutron-large-ops, and cells should be gating | 16:07 |
fungi | well, those weren't removed from projects which are currently running them | 16:08 |
jeblair | fungi: yes, but those _are_ the jobs that are different from sergey's change | 16:08 |
sdague | so right now, neutron-full should not be gating | 16:08 |
sdague | and I think neutron-large-ops is worth thinking about as whether it's actually a co-gate job | 16:08 |
fungi | i'm in favor of SergeyLukjanov's change too, though it's currently wip | 16:09 |
jeblair | sdague: neutron-full is gating by virtue of being in check, fwiw. | 16:09 |
sdague | it's not actually voting | 16:09 |
* SergeyLukjanov reading backlog | 16:09 | |
jeblair | ok, well, SergeyLukjanov didn't change that anyway... should neutron-full not be in check? | 16:10 |
SergeyLukjanov | sdague, jeblair, my CR was just to extract common part of the gate | 16:10 |
*** oubiwan__ has quit IRC | 16:10 | |
SergeyLukjanov | I'm ok with starting from small pack | 16:11 |
jeblair | SergeyLukjanov: i know. sdague has one to extract a smaller set. but i want to know where this is going. | 16:11 |
jeblair | am i going to -1 the next 4 patches he submits because they remove gating jobs? or is he going to submit 4 patches that end up making his change the same as yours... | 16:11 |
sdague | jeblair: so SergeyLukjanov's set is a straight extract | 16:11 |
jeblair | these are questions that are worth asking beforehand. :) | 16:11 |
sdague | which means it doesn't include all the integrated projects | 16:11 |
sdague | because the integrated projects don't all run these things | 16:12 |
SergeyLukjanov | IIRC sdague wants to achieve clean set of integr gate by making small template and add only needed jobs to it and remove all other | 16:12 |
sdague | my approach is start minimal | 16:12 |
sdague | and enforce on all integrated projects | 16:12 |
sdague | which actually means turning on jobs on some of them | 16:12 |
jeblair | sdague: okay, cool, i just want to know what the end state is | 16:12 |
*** chandan_kumar has quit IRC | 16:13 | |
fungi | lifeless: so the building state nodes were still hanging around, and there were enough of them that nodepool didn't think you needed new ones. i'm deleting them now (it appears after the 8 hour mark nodes in a building state can be deleted, but until then there's a database lock on those rows) | 16:13 |
sdague | because my next patch is to add the heat-slow job to this, because I think we do need to co-gate on that. And otherwise I'd like to discuss at summit if the large ops jobs are actually co-gate or should be on just some projects | 16:14 |
jeblair | sdague: you think large-ops might be safe to asymmetrically gate? | 16:14 |
sdague | but after resistance to my drop of unit tests in the gate, I'm leaving removes until after summit | 16:14 |
sdague | jeblair: yes, because they are basically nova performance tests | 16:14 |
*** prad_ has joined #openstack-infra | 16:15 | |
fungi | lifeless: you should (hopefully) see 35 new instances spinning up in nova | 16:15 |
sdague | but, I think that's a summit discussion | 16:15 |
jeblair | sdague: would we be adding more large-ops jobs if we added it to your template? | 16:16 |
*** reed has joined #openstack-infra | 16:16 | |
sdague | yeh, you'd end up putting it on trove, for instance | 16:16 |
*** atiwari has joined #openstack-infra | 16:16 | |
jeblair | sdague: and, significantly more? i'm thinking it's fine if you want to remove it, but if it's running _nearly_ everywhere now, perhaps the more consistent thing to do would be to go ahead and add it to the template and run it everywhere now | 16:17 |
*** dizquierdo has quit IRC | 16:17 | |
jeblair | and then remove it when we decide to remove it, rather than decided to freeze it arbitrarily where it is now | 16:17 |
sdague | jeblair: we were legitimately out of nodes the last couple of days, so I don't want to burn cycles on uselessnes | 16:17 |
jeblair | sdague: then remove it everywhere. | 16:17 |
sdague | I'd rather do some real analysis before making that decision | 16:18 |
sdague | this was the point of doing the small version | 16:18 |
sdague | because I think we can all agree on that change. And I'd rather not hold that on all the decisions on the stuff that requires some analysis | 16:19 |
jeblair | sdague: the layout file is way too complicated; i'd much rather have it be comprehensible and represent what we are trying to accomplish and burn the occasional node on a trove test than not be able to understand why large-ops is on these 7 projects but not these other 2 | 16:19 |
*** sarob__ has joined #openstack-infra | 16:20 | |
fungi | and apparently the other reason we've been node starved is that each of rax-dfw and iad had 95 nodes stuck in building for more than 8 hours as well, so i'm deleting those now too | 16:20 |
*** changbl has joined #openstack-infra | 16:20 | |
sdague | jeblair: and you don't think it's better to actually have reasons for why each of these things are required for all integrated projects? | 16:20 |
jeblair | besides, if node exhaustion is really the most important thing, how's about we not run "gate-noop" on them | 16:21 |
*** smarcet has left #openstack-infra | 16:21 | |
*** andreaf has quit IRC | 16:21 | |
*** rcarrillocruz has joined #openstack-infra | 16:21 | |
jeblair | sdague: i do, but you don't want to examine those reasons until the summit. so until then you want to maintain the status quo. afaik the status quo is they run everywhere | 16:21 |
*** jcooley_ has joined #openstack-infra | 16:21 | |
sdague | the status quo is to run where they run | 16:21 |
*** sarob___ has joined #openstack-infra | 16:22 | |
jeblair | they only run not everywhere because we're really bad at updating this file. we're fixing that. | 16:22 |
sdague | I'm not convinced of that. A lot of times they don't run everywhere because when deciding 'does this job make sense here' the answer is no | 16:23 |
sdague | sometimes it's misses, and sometimes is a decision | 16:23 |
*** rcarrillocruz1 has quit IRC | 16:23 | |
jeblair | sdague: where does it make sense to run large ops then? | 16:23 |
fungi | i think the special snowflake decisions are adverse from a consistency standpoint | 16:24 |
*** thuc has joined #openstack-infra | 16:24 | |
sdague | honestly, I don't know. And I don't have the time to figure that out right now. So I don't want to add or remove that test to jobs until we do. | 16:24 |
fungi | because they make it a lot harder to tell the difference between intentional and accidental exceptions | 16:24 |
*** sarob__ has quit IRC | 16:24 | |
sdague | fungi: I agree with all of this, which is why this was about minimum step forward | 16:24 |
*** thuc has quit IRC | 16:25 | |
*** thuc has joined #openstack-infra | 16:25 | |
sdague | I have strong justification for the 3 jobs in the template that I added, which I can very much defend | 16:25 |
sdague | but all the rest of those... | 16:25 |
sdague | I don't know right now | 16:25 |
*** thuc has quit IRC | 16:26 | |
*** thuc has joined #openstack-infra | 16:26 | |
*** sarob___ has quit IRC | 16:26 | |
*** thuc has quit IRC | 16:27 | |
*** thuc has joined #openstack-infra | 16:27 | |
*** thuc has quit IRC | 16:28 | |
*** thuc has joined #openstack-infra | 16:28 | |
jeblair | sdague: so is there a next step after your patch before the summit? | 16:29 |
jeblair | sdague: you say: "After this merges we can take other jobs one at a time here." but i'm not seeing what the next step is except wait 3 months | 16:29 |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Auth hotfix https://review.openstack.org/78653 | 16:30 |
sdague | I think there needs to be analysis of jobs we want to lock into this, especially some documentation that they've managed to catch a cross project issue. | 16:30 |
jeblair | sdague: okay, so there won't actually be any follow-on patches for 3 months | 16:32 |
sdague | well, first it's 2 months | 16:32 |
sdague | secondly, not without some other analysis | 16:32 |
sdague | if we can do that offline, cool | 16:32 |
sdague | if not, ensure we discuss at summit | 16:33 |
sdague | honestly, I'm not sure why this is controversial | 16:33 |
*** coolsvap has joined #openstack-infra | 16:34 | |
jeblair | sdague: because if we merge your patch i don't know how to review further changes to zuul's configuration. | 16:34 |
sdague | this was minimal for a reason, because I actually want to talk with jogo about the large ops jobs, because I'd like to make sure we understand how they would find issues on a keystone change | 16:35 |
fungi | given that, i'm more in favor of whatever we can do now to templatize anything in layout.yaml which will shrink it substantially | 16:35 |
jeblair | are we in an "integrated job freeze" until the summit? can we add large ops jobs to missing projects? can we remove them? i dunno. | 16:35 |
jeblair | i know what I think the current status quo is -- we run all the jobs everywhere | 16:35 |
*** sabari has joined #openstack-infra | 16:35 | |
sdague | jeblair: we can add them with justification | 16:35 |
jeblair | and when we don't do that, it's only because the current structure makes it nearly impossible to catch that in review. and that's what we were trying to fix, and that's what you're asking us to give up. | 16:36 |
sdague | jeblair: we are so far away from running all the jobs everywhere though | 16:36 |
sdague | ok, so I give up then. I really thought this was a helpful step | 16:36 |
*** afazekas has quit IRC | 16:37 | |
*** saper_ has joined #openstack-infra | 16:38 | |
*** sabari3 has joined #openstack-infra | 16:38 | |
sdague | I feel like what's currently in the check and gate queues are based on whatever someone managed to get in once, and have a very low level of understanding on what's in there. So I'd rather not massively encode that in the template | 16:39 |
sdague | I'd rather encode in the template things we are *really* sure on | 16:39 |
sdague | and make it so that's a pretty high standard of *yes* absolutely, it's valuable to run this on all integrated projects for these following reasons | 16:39 |
sdague | ... | 16:39 |
fungi | for some reason, ord server boot count has jumped back to 5000 available in the past hour (from 0) so i'm putting it back into rotation too | 16:39 |
*** basha has joined #openstack-infra | 16:40 | |
fungi | though i think i'd like to get https://review.openstack.org/78639 merged before reenabling puppet on nodepool.o.o | 16:41 |
jeblair | sdague: right, but our thinking has been "don't create assymetrical gates, and the more jobs we run, the more nondeterministic bugs we catch" | 16:41 |
*** adrian_otto has quit IRC | 16:41 | |
jeblair | sdague: i'm okay revisiting that, but it feels like your change is done with the thought that we should alter that thinking but the completion of that thought is months away | 16:42 |
*** sabari has quit IRC | 16:42 | |
jeblair | sdague: whereas there's an alternate approach out there which is that we just "fix" whatever things have slipped through the gap in our current thinking | 16:42 |
sdague | jeblair: with clean check required, blocking projects moving code forward because we like to catch non deterministic bugs their code seems very unfair | 16:42 |
sdague | non deterministic bugs that are unrelated to their code | 16:43 |
sdague | if we want more jobs, we should work on getting the other periodic queue running to build that result set | 16:43 |
jeblair | sdague: sure, but the analysis that it's truly unrelated doesn't exist, and is what you are proposing be postponed for 2.5 months | 16:43 |
sdague | which gives us the data | 16:43 |
*** sweston has joined #openstack-infra | 16:44 | |
sdague | jeblair: what I'm proposing is I like to actually sleep sometimes | 16:44 |
sdague | and not work every weekend | 16:44 |
sdague | so I'm being realistic | 16:44 |
jeblair | well, i can't argue with that | 16:44 |
*** SumitNaiksatam has quit IRC | 16:45 | |
sdague | if someone else wants to do this analysis earlier, I'm totally happy with that | 16:45 |
sdague | but I think it's part of the process | 16:45 |
*** basha has quit IRC | 16:45 | |
sdague | and with other commitments, it's going to take a while for me to get to it | 16:45 |
*** dims has quit IRC | 16:45 | |
fungi | lifeless: SpamapS: did you end up seeing instances appear? i've got more 35 nodepool nodes which claim to have been in a "building" state in your cloud for over half an hour now | 16:46 |
*** rpodolyaka has joined #openstack-infra | 16:46 | |
*** SumitNaiksatam has joined #openstack-infra | 16:46 | |
sdague | so I'm not saying we can't have that conversation until summit. I'm saying I'm not going to be able to drive it until then | 16:47 |
*** dstanek_afk has joined #openstack-infra | 16:47 | |
*** dstanek has quit IRC | 16:47 | |
*** dstanek_afk is now known as dstanek | 16:48 | |
fungi | lifeless: SpamapS: i think something must still be wrong on your end because i'm getting no instance id or ip address for any of them | 16:48 |
fungi | lifeless: SpamapS: is there perhaps something special which can break when trying to boot from a snapshot, which doesn't come into play when booting from a "normal" glance-uploaded image? | 16:49 |
*** yamahata has joined #openstack-infra | 16:50 | |
*** mriedem has quit IRC | 16:51 | |
*** rpodolyaka has quit IRC | 16:51 | |
*** sweston has quit IRC | 16:51 | |
jeblair | sdague: why not say "everything gates on what everything else gates on" which is more or less what we've been trying to do, and then remove those jobs one at a time if they don't fit? | 16:52 |
openstackgerrit | Doug Hellmann proposed a change to openstack-infra/config: Create ACL groups for oslo.rootwrap https://review.openstack.org/78667 | 16:52 |
jeblair | sdague: i think that leaves the configuration in a cleaner state, and one where it's much easier for us to review changes | 16:52 |
*** amcrn has quit IRC | 16:53 | |
*** packet has joined #openstack-infra | 16:53 | |
sdague | jeblair: because when we are near a milestone we're waiting 1.5 hrs for a devstack node to be allocated on check | 16:54 |
sdague | and adding a ton of additional jobs doesn't make that better | 16:55 |
*** gokrokve has quit IRC | 16:55 | |
*** bada has joined #openstack-infra | 16:56 | |
sdague | if that's what you want to do, so be it. my opinion is it's the wrong thing because we do actually have finite resources. And it makes me, as QA PTL, actually hold back on adding new parts of this matrix because it's already completely overloaded | 16:56 |
sdague | maybe that's the key tension | 16:56 |
lifeless | fungi: tryig manually with tripleo-precise-1394069856.template.openstack.org | 16:57 |
openstackgerrit | A change was merged to openstack-infra/storyboard: Auth hotfix https://review.openstack.org/78653 | 16:57 |
sdague | I think we're back in starvation for our crunch times | 16:57 |
lifeless | fungi: image is in state spawning | 16:58 |
sdague | and I'm about to propose another devstack job on everything (the heat one) | 16:58 |
lifeless | fungi: and running | 16:58 |
lifeless | fungi: try nova list from the nodepool box? | 16:58 |
lifeless | | 63e19767-4a6c-42c1-a734-9c179a635730 | live-migration-test2 | ACTIVE | - | Running | default-net=10.0.58.232 | 16:58 |
sdague | so I don't think we should be running stuff we aren't sure has value, because our resources are actually finite | 16:58 |
jeblair | sdague: again, "gate-noop" is a much better target for your ire on that. | 16:58 |
*** rpodolyaka has joined #openstack-infra | 16:58 | |
*** jungleboyj has joined #openstack-infra | 16:59 | |
sdague | ok, so lets purge that as well | 16:59 |
sdague | how many nodes does that save' | 16:59 |
sdague | ? | 16:59 |
*** saper_ has quit IRC | 16:59 | |
sdague | nova lost half a day tuesday because of the time it took to get check results in reaction to upstream library release breaks | 16:59 |
jeblair | sdague: and yeah, there are resource issues and we're trying to fix them. we could continue to try to fix them, or we could give up and say that our cloud providers are incapable of providing the resources we need. | 17:00 |
*** eharney has joined #openstack-infra | 17:00 | |
*** dims has joined #openstack-infra | 17:00 | |
*** dstufft_ has joined #openstack-infra | 17:01 | |
sdague | jeblair: or we could be a little more careful about what we thing we need, to keep some overhead available for when we need it | 17:01 |
lifeless | fungi: deleted it now | 17:01 |
fungi | sdague: we could stop using oslo libraries so that they don't break nova any more? i'm not quite sure what your point was on that statement | 17:01 |
sdague | fungi: my point wasn't the oslo broke us | 17:01 |
sdague | my point was we couldn't figure out if we fixed it in nova | 17:01 |
jungleboyj | I have a gate run that failed this morning in check-tempest-dsvm-neutron - setting up devstack with a bunch of 'no such file or directory' when trying to get packages. Any ideas what that might have been? | 17:01 |
*** harlowja has joined #openstack-infra | 17:02 | |
fungi | jungleboyj: without actual context, no. no idea whatsoever. do you have a link to the log? | 17:02 |
jungleboyj | fungi: Sure. Sorry not sure what all you guys need for context. :-) http://logs.openstack.org/94/72494/3/check/check-tempest-dsvm-neutron/7ec8b93/logs/devstacklog.txt.gz | 17:03 |
*** rcleere has joined #openstack-infra | 17:03 | |
*** basha has joined #openstack-infra | 17:03 | |
*** rpodolyaka has quit IRC | 17:03 | |
jogo | dims: ping, libvirt 1.x? | 17:03 |
sdague | but apparently I have a very different view on this one | 17:03 |
sdague | so I'm going to abandon that patch, because I don't think it's worth this much fight. | 17:03 |
jeblair | sdague: yeah, i get it, it's frustrating. but you're suggesting that we stop running tests that we want to run based on the performance of a degraded system. | 17:03 |
jeblair | sdague: i think we should fix the degredation | 17:04 |
jeblair | and run the tests we want to run | 17:04 |
*** dstufft has quit IRC | 17:04 | |
sdague | jeblair: no, I'm actually saying we stop running tests we aren't sure we wan to run | 17:04 |
*** saper has joined #openstack-infra | 17:04 | |
sdague | you are assuming we know we want to run all those tests, and I actually want to have *that* conversation | 17:04 |
*** jcooley_ has quit IRC | 17:04 | |
sdague | but I don't have time to have it now, or build the data to make the right decisions | 17:04 |
jeblair | sdague: but you don't want to have it for 2.5 months | 17:04 |
sdague | jeblair: I actually want to have it now | 17:05 |
fungi | jungleboyj: that's definitely a new one on me... it looks like apt-get tried to write index files to disk and failed... in hpcloud-az2 as well... checking logstash to see how many jobs may have been impacted on other machines similarly in the past week | 17:05 |
jungleboyj | fungi: It is like the node's filesystem went bad or something when it was trying to set up the environment. Don't know what I can reverify against for that though. | 17:05 |
sdague | I don't have time to gather enough data to have it be useful now | 17:05 |
*** bada has quit IRC | 17:05 | |
jungleboyj | fungi: thank you! | 17:05 |
*** hashar has quit IRC | 17:05 | |
sdague | I'm totally happy to have it if someone else is willing to go collect that data | 17:05 |
jeblair | jogo: do you think large-ops should run on changes to all projects or just nova? | 17:06 |
jogo | jeblair: so it doesn't touch cinder, so not cinder for sure | 17:06 |
jogo | but in *theory* keystone, swift, nova, glance are all tested by it (and neutron for neutron version) | 17:07 |
jogo | and rootwrap of course | 17:07 |
*** mriedem has joined #openstack-infra | 17:07 | |
sdague | jogo: for keystone, swift, glance, will it catch issues the other jobs will not? | 17:07 |
jeblair | "I'm hoping we can get the new rate classes in place by end of day tomorrow (I'm an optimist), but I think worst-case would be next week." | 17:08 |
jeblair | fungi, sdague: ^ just got an update from rax about the limits | 17:08 |
fungi | jungleboyj: looks like it hit one other job in the past 7 days... http://logs.openstack.org/41/66541/23/check/check-tempest-dsvm-ironic-postgres-nv/9254081/logs/devstacklog.txt.gz | 17:08 |
sdague | jeblair: cool | 17:08 |
jogo | sdague: I *think* so. but if you want to run on nova only for resource reasons I am fine with that | 17:08 |
jogo | nova and devstack and tempest that is | 17:08 |
*** rwsu has joined #openstack-infra | 17:08 | |
fungi | jungleboyj: within a few minutes of yours | 17:08 |
*** sarob_ has quit IRC | 17:08 | |
jogo | sdague: so glance i doublt it will catch anything and swift too | 17:09 |
sdague | jogo: if you can find any change, ever, where it did find a failure mode we didn't catch in the full job, I'd be fine on running it those places. It's just not clear to me that it will. | 17:09 |
jogo | but keystone perhaps | 17:09 |
jogo | actually wait no | 17:09 |
fungi | jungleboyj: and also in hpcloud-az2 | 17:09 |
jeblair | i'm really against the idea that we ever chose not to run a test for resource reasons | 17:09 |
dims | jogo, pong re: 1.x libvirt - waiting for UCA team to update 1.2.x in icehouse/uca proposed deb repo | 17:09 |
jeblair | i'm okay with choosing not to run it because we know it's pointless | 17:09 |
jungleboyj | fungi: Interesting. So something environmental? | 17:09 |
jogo | sdague: so nova and neutron are the biggest risks for large-ops | 17:09 |
jogo | jeblair: ^ | 17:09 |
sdague | jeblair: so that decision gets made all the time | 17:10 |
*** gyee has joined #openstack-infra | 17:10 | |
jeblair | but "if you want to run on nova only for resource reasons" elicits from me: "no, not for resource reasons" | 17:10 |
jogo | jeblair: what is the reason? | 17:10 |
sdague | and if it's not made at the infra level, so infra is often pegged | 17:10 |
fungi | jungleboyj: i'm willing to bet hpcloud had some sort of issue with some of their storage in az2 around 06:25 | 17:10 |
fungi | (utc) | 17:10 |
sdague | it will get made at lower levels, because developers would rather not be waiting | 17:10 |
*** afazekas has joined #openstack-infra | 17:11 | |
jeblair | sdague: is it never okay to wait for a test result? | 17:11 |
SpamapS | fungi: good morning. | 17:11 |
jogo | dims: ack, lets move the discussion about this to nova room for a second | 17:11 |
sdague | jeblair: I'm not saying it isn't | 17:11 |
jeblair | sdague: i mean, if it's that important, why not run the test on your workstation. | 17:11 |
jungleboyj | fungi: Where do you see where it ran? | 17:11 |
*** andre__ has quit IRC | 17:11 | |
fungi | jungleboyj: at the top of the console log. we encode the provider, region and image type into the slave hostname | 17:11 |
sdague | but I'm saying that if people are waiting on infra a lot, they stop coming up with new interesting tests they want to put into the pool | 17:12 |
SpamapS | fungi: I do not see any instances running on our cloud for your tenant. | 17:12 |
fungi | SpamapS: yep, i'm trying to whip up a novaclient session on the nodepool server to see what's listed | 17:12 |
*** jcoufal has joined #openstack-infra | 17:12 | |
jeblair | sdague: you will convince me that we should not run useless tests. you will not convince me that we should change our goals in testing based on a temporary performance degradation in the clouds we use. | 17:12 |
jungleboyj | fungi: Ah, thank you. So, should I open a bug for this and reverify against that or is there a more appropriate solution? | 17:12 |
fungi | SpamapS: but for some reason nodepool thinks it launched 35 instances from that image about an hour ago when i cleared the old ones | 17:13 |
sdague | jeblair: aren't we runnig more quota than we ever ran? | 17:13 |
fungi | SpamapS: and seems to be waiting to hear back what the instance and ip address are | 17:13 |
*** e0ne has quit IRC | 17:13 | |
*** derekh has joined #openstack-infra | 17:13 | |
jeblair | sdague: of course, but we're using only a portion of what we could be because of the rax issue | 17:13 |
SpamapS | fungi: ok I'll look in the logs to see if we have errors | 17:14 |
fungi | jungleboyj: yes, that will work. whatever it was seems to be very brief (and over now) but it'll be useful for tracking purposes in case we see anything related from that timeframe | 17:14 |
sdague | jeblair: that was true when we hit the oslo.messaging issue? | 17:14 |
*** dstufft_ is now known as dstufft | 17:14 | |
sdague | I thought we were just solidly flat out on that | 17:14 |
jungleboyj | Ok. Thank you fungi ! | 17:14 |
jeblair | sdague: we've been bumping up against the rax servers/day quota daily for about a week | 17:14 |
sdague | jeblair: sure, but this was early in the day | 17:14 |
jeblair | sdague: was there a backlog caused by slow runs late in the day? | 17:15 |
sdague | based on what I saw in the check and gate queues, we wen're hitting quota | 17:15 |
sdague | I get there is also a quota issue | 17:15 |
jeblair | i should have said rate limit | 17:15 |
sdague | sure, rate limit | 17:15 |
sdague | I do actually understand that issue | 17:15 |
*** dkliban is now known as dkliban_lunch | 17:16 | |
jeblair | k. they are both at play so it's probably better to be clear. | 17:16 |
*** jcooley_ has joined #openstack-infra | 17:16 | |
sdague | from my recolection on when we were getting boned by this, it looked like we were running as hot as we could (weren't hitting the rate issue) | 17:16 |
jeblair | sdague: we wasted 220 nodes on gate-noop yesterday. | 17:16 |
sdague | over 1 day? | 17:16 |
jeblair | yep | 17:16 |
jogo | dims: do you need help prodding the UCA team? | 17:17 |
sdague | so that's 4% of nodes? | 17:17 |
jogo | zul: ^ | 17:17 |
zul | uh? | 17:17 |
jeblair | sdague: we ran 23863 jobs yesterday. | 17:17 |
sdague | ok, so 1%? | 17:17 |
jeblair | yeah | 17:18 |
sdague | so that's not much head room that it gives us | 17:18 |
jeblair | also. omg. :) | 17:18 |
sdague | so, sure, we should get rid of it | 17:18 |
jeblair | yeah. not a panacea tho. | 17:18 |
*** sdake_ has quit IRC | 17:18 | |
*** dkorolev has joined #openstack-infra | 17:19 | |
jeblair | at any rate, if you look at the node graph now, that really high orange bit is not normal. that's nodepool spinning on trying and failing to create nodes | 17:19 |
jeblair | (some of the probably tripleo, but a lot of them rax) | 17:19 |
sdague | jeblair: yep | 17:19 |
sdague | I agree with that, this morning's issue is different | 17:19 |
sdague | and not the thing I'm trying to solve | 17:19 |
sdague | so what I've seen is we're doing about 5x in check than in gate | 17:20 |
jeblair | and even when rax is letting us build nodes, if it's come after a sustained period where we could not, we're going to have a backlog to work through | 17:20 |
sdague | and gate is merging ~100 a day | 17:20 |
sdague | so adding a new job ends up being +600 devstack nodes a day | 17:20 |
*** sarob_ has joined #openstack-infra | 17:20 | |
sdague | if we run it across the integrated projects | 17:21 |
*** basha has quit IRC | 17:22 | |
SpamapS | fungi: nothing in our logs suggests errors. Let me try booting a snapshot. | 17:22 |
fungi | SpamapS: lifeless said he tried that a few minutes ago and it worked | 17:22 |
SpamapS | ah | 17:22 |
fungi | SpamapS: i wasn't seeing anything obvious in the nodepool debug log containing the novaclient request or response, but i'll look closer | 17:23 |
sdague | anyway, decide or not decide on the review. I think the data from jogo says the large ops jobs definitely shouldn't be run everywhere | 17:24 |
sdague | which is good data | 17:24 |
sdague | got to get to other things | 17:24 |
SpamapS | fungi: ok. I do see three snapshot images in your tenant... | 17:24 |
*** apevec has quit IRC | 17:24 | |
jogo | sdague: link? | 17:24 |
fungi | SpamapS: that sounds right. that's what i have from nodepool too | 17:24 |
sdague | jogo: https://review.openstack.org/#/c/78612/ | 17:25 |
*** reed has quit IRC | 17:25 | |
jeblair | jogo: would you be willing to propose a change on top of sdague's change https://review.openstack.org/#/c/78612/ that sets large-ops to run only where you think they should? | 17:25 |
*** sarob_ has quit IRC | 17:25 | |
*** gokrokve has joined #openstack-infra | 17:26 | |
fungi | SpamapS: the most debug-level data i have from nodepool on it looks like http://paste.openstack.org/show/72796/ (not especially helpful in this case) | 17:27 |
*** andre__ has joined #openstack-infra | 17:27 | |
*** nati_uen_ has quit IRC | 17:27 | |
fungi | SpamapS: i believe it tried to use snapshot 8799d365-c7e9-49ca-a363-4d0d2a1562ed which we think is named tripleo-precise-1394069856.template.openstack.org | 17:28 |
jogo | jeblair: sure, so one quick question -- what is the motivation for pruning where we run jobs? | 17:29 |
jogo | right now the job is running in extra places ... so cleaning it sounds good | 17:30 |
jeblair | jogo: aiui sdague think's it's unfair to gate a project on a job that can't be affected by it | 17:30 |
jogo | ahh | 17:30 |
jogo | so in that case cinder shouldn't gate on it | 17:30 |
jogo | we don't test cinder in the job | 17:31 |
jogo | but ceilometer could break things | 17:31 |
jogo | not sure about currently -- but it used to inject code into nova | 17:31 |
*** jcooley_ has quit IRC | 17:31 | |
SpamapS | fungi: glance has a different ID for that snapshot | 17:32 |
SpamapS | fungi: | 117149f6-1bf6-45c4-9b24-149fe0ffe699 | tripleo-precise-1394069856.template.openstack.org | qcow2 | bare | 5078908928 | active | | 17:32 |
dansmith | are we having trouble with the largeops test? | 17:32 |
*** sabari3 has quit IRC | 17:32 | |
jeblair | dansmith: not that i'm aware of | 17:32 |
dansmith | https://jenkins03.openstack.org/job/gate-tempest-dsvm-large-ops/8611/console | 17:33 |
dansmith | jeblair: looks like a few of mine are failing on that test, which wouldn't be in that tested path, AFAIK | 17:33 |
*** talluri has joined #openstack-infra | 17:33 | |
*** sabari has joined #openstack-infra | 17:33 | |
fungi | oh joy... looks like nova.clouds.archive.ubuntu.com is breaking devstack jobs | 17:33 |
jeblair | what serendipitous timing | 17:34 |
jogo | dansmith: ttp://logs.openstack.org/88/76388/8/check/gate-tempest-dsvm-large-ops/9c40e55/logs/devstacklog.txt.gz#_2014-03-06_17_14_15_748 | 17:34 |
fungi | could be a mirror update in progress... "Hash Sum mismatch" on some of their indexes | 17:34 |
*** ildikov_ has quit IRC | 17:34 | |
dansmith | jogo: ah, thanks | 17:34 |
*** jnoller has joined #openstack-infra | 17:34 | |
*** amcrn has joined #openstack-infra | 17:34 | |
jogo | http://logs.openstack.org/88/76388/8/check/gate-tempest-dsvm-large-ops/9c40e55/logs/devstacklog.txt.gz#_2014-03-06_17_14_15_748 | 17:35 |
jogo | SpamapS: ^ | 17:35 |
*** jgallard has quit IRC | 17:35 | |
jeblair | sdague: i'm pretty sure we agree. we've always acknowledged that we're testing combinations that aren't necessary. but no one has wanted to do the analysis on that to determine which combos are necessary. | 17:36 |
clarkb | morning | 17:36 |
openstackgerrit | Doug Hellmann proposed a change to openstack-infra/config: Add gate jobs for oslo libraries https://review.openstack.org/76945 | 17:36 |
fungi | SpamapS: ahh, you're right, we have image id of 117149f6-1bf6-45c4-9b24-149fe0ffe699 for it... i was looking at the server id which it was built from | 17:36 |
jeblair | sdague: i think the issue i have is that your patch is the first step in that approach but you're saying you won't be doing that analysis for a while | 17:36 |
jeblair | sdague: so i'm worried about starting down that road in case we don't actually finish the trip | 17:36 |
clarkb | jeblair: I was thinking yesterday that the gerrit event stream should include replication event notifications | 17:37 |
sdague | jeblair: right, so my opinion is that it leads to 2 failure modes | 17:37 |
jeblair | clarkb: that would be 3rd way to solve it. :) | 17:37 |
openstackgerrit | A change was merged to openstack-infra/devstack-gate: Rename tempest.conf so it is gz'ed properly https://review.openstack.org/76622 | 17:38 |
*** vkozhukalov has joined #openstack-infra | 17:38 | |
*** briancurtin has left #openstack-infra | 17:38 | |
openstackgerrit | gordon chung proposed a change to openstack-infra/config: enable doc generation for pycadf https://review.openstack.org/75998 | 17:38 |
sdague | so my feeling is it's better to not default to 'run this job even if we don't know why it's useful' | 17:38 |
zaro | clarkb: take a look -> https://code.google.com/p/gerrit/issues/detail?id=2517 | 17:39 |
sdague | because as much as I know you believe that we should never consider resources as an issue | 17:39 |
sdague | I do | 17:39 |
jeblair | sdague: (fwiw, note the downward trend on building nodes, upward trend on workers and downward trend on waiting jobs which roughly correspond with when fungi put rax back into the config.) | 17:39 |
sdague | and I've stopped thinking about new ways we should be testing openstack as a whole because I don't feel we have the headroom for it | 17:39 |
*** jpich has quit IRC | 17:40 | |
fungi | well, and also to when i deleted 190 slaves stuck building for more than 8 hours in iad and dfw | 17:40 |
clarkb | zaro: woo | 17:40 |
jeblair | sdague: do you think we need more quota? | 17:40 |
clarkb | zaro: I guess we deal with it with your patch then | 17:40 |
*** gokrokve_ has joined #openstack-infra | 17:40 | |
fungi | and manually added py3k-precise and bare-centos6 to hpcloud az1 and az3 | 17:40 |
jeblair | sdague: and are you basing that on the behavior of the system for the past week or other times? | 17:41 |
sdague | I'm basing it on the behavior of the system the week of any milestone | 17:41 |
jeblair | (no matter what our quota and job demands are, if things break we're going to get behind) | 17:41 |
*** jlibosva has quit IRC | 17:41 | |
sdague | jeblair: right, but I think we need to not be idealistic and realize we've yet to handle a milestone without a break at this point :) | 17:41 |
davidlenwell | Howdy infra team! So this review is kicking back pep8 stuff but I'm not seeing it . https://review.openstack.org/#/c/78076/ should I have them re-review or something ? | 17:42 |
*** krotscheck has quit IRC | 17:42 | |
jeblair | sdague: you're really bumming me out. we're merging like tons of changes with like a couple hours delay in check results despite huge failures from our cloud providers. | 17:42 |
clarkb | davidlenwell: ImportError: cannot import name Feature you depend on a thing that uses a feature that was removed from setuptools | 17:42 |
clarkb | davidlenwell: which is :( | 17:43 |
jeblair | sdague: it's not perfect, but it's not nearly as terrible as you're making it out. | 17:43 |
sdague | jeblair: I'm not saying it's terrible | 17:43 |
*** esmute has quit IRC | 17:43 | |
davidlenwell | clarkb: :( that is sad | 17:43 |
clarkb | davidlenwell: markupsafe is the offender | 17:43 |
*** gokrokve has quit IRC | 17:43 | |
jeblair | sdague: i'm trying to not tune the system during a period where we know it's not behaving as it should, but you keep insisting that we do, so let's consider it | 17:44 |
openstackgerrit | David Lyle proposed a change to openstack/requirements: Adding support for Django 1.6 https://review.openstack.org/77015 | 17:44 |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: fix setting of default values for missing parameters in jenkins.ini file. https://review.openstack.org/78458 | 17:44 |
jeblair | sdague: none of the changes you are talking about would have affected the backlog we've seen due to the rax outage | 17:44 |
sdague | jeblair: sure | 17:44 |
lifeless | fungi: SpamapS: sorry I'm awolish - at the HP office in meetings | 17:44 |
jeblair | sdague: because that backlog was due to jobs running on nodes that we only run on rackspace | 17:44 |
sdague | so your saying on Tuesday, there was a giant rax outage? | 17:44 |
davidlenwell | clarkb: where are you finding that? | 17:44 |
StevenK | lifeless: Palo Alto, or Moffett? | 17:45 |
*** rlandy is now known as rlandy|bbl | 17:45 | |
SpamapS | lifeless: np.. any clues as to what we should do next? | 17:45 |
sdague | sorry, I may not have been paying attention to that one. | 17:45 |
clarkb | davidlenwell: in the pep8 job log | 17:45 |
*** dangers_away is now known as dangers | 17:45 | |
lifeless | StevenK: moffett | 17:45 |
sdague | I'm not actually optimizing for what's happening right now | 17:45 |
jeblair | sdague: nodepool received 58630 OverLimit responses on tuesday | 17:45 |
jeblair | (utc tuesday) | 17:46 |
jogo | sdague jeblair: wow we did have large-ops in way too many places | 17:46 |
*** rcleere has quit IRC | 17:46 | |
sdague | jogo: thank you, this is why I wanted to revisit it | 17:46 |
lifeless | fungi: SpamapS: do we have a nodepool image-list for the cloud please? | 17:46 |
jogo | sdague: such as heat | 17:46 |
lifeless | that should list the external id | 17:46 |
fungi | lifeless: yes, pasting... | 17:46 |
jeblair | sdague: so yeah, i've been trying to convey that this has been a significant problem for a week. | 17:47 |
lifeless | which should be the cloud uuid for the thing | 17:47 |
sdague | jeblair: ok, that's fine | 17:47 |
*** rpodolyaka has joined #openstack-infra | 17:47 | |
sdague | so once we are running with headroom again, and it feels like we have it during a milestone like j1, I'll spend time thinking about new ways to test things | 17:48 |
openstackgerrit | Matt Ray proposed a change to openstack-infra/config: Add Ceph support to existing Chef cookbooks https://review.openstack.org/78681 | 17:48 |
jeblair | sdague: you've convinced me that what you want to do is a good thing. you've also convinced me that you are not going to do it. | 17:48 |
jeblair | how can i approve a patch like that? | 17:48 |
SpamapS | lifeless: I pasted the image UUID earlier | 17:48 |
SpamapS | fungi: | 117149f6-1bf6-45c4-9b24-149fe0ffe699 | tripleo-precise-1394069856.template.openstack.org | qcow2 | bare | 5078908928 | active | | 17:48 |
SpamapS | lifeless: ^^ | 17:49 |
fungi | SpamapS: lifeless: http://paste.openstack.org/show/72799/ | 17:49 |
*** zhiyan is now known as zhiyan_ | 17:49 | |
fungi | SpamapS: that seems to match the newest image i have listed from nodepool too | 17:49 |
fungi | SpamapS: lifeless: it's the one from the image-update i kicked off last night | 17:50 |
clarkb | sdague: I don't want to disrupt the other discussion but I think we need to revert https://review.openstack.org/#/c/66564/ there is a reason for not using a static instance and that is so that setup_logging will work as expected | 17:50 |
openstackgerrit | A change was merged to openstack-infra/gerritbot: Some README fixups, including git url https://review.openstack.org/78402 | 17:50 |
*** dkorolev has quit IRC | 17:51 | |
SpamapS | fungi: ok so should I see nodepool attempting to boot things? | 17:52 |
zaro | clarkb: question here, https://review.openstack.org/#/c/73687/1/modules/openstack_project/files/jenkins_job_builder/config/macros.yaml | 17:52 |
fungi | SpamapS: not at this point because it's still waiting to hear back from the 35 instances it started building | 17:52 |
zaro | clarkb: not sure what you mean by that. is that a bad thing? | 17:52 |
clarkb | zaro: yes, look at the xml diff job log | 17:52 |
lifeless | SpamapS: so they match up | 17:52 |
*** rcleere has joined #openstack-infra | 17:53 | |
lifeless | fungi: ^ | 17:53 |
*** sdake_ has joined #openstack-infra | 17:53 | |
fungi | i'm testing right now to see if i can make novaclient work sanely on nodepool.o.o for talking manually to your cloud so i can try to emulate the things it's trying to do | 17:53 |
lifeless | SpamapS: gotta run, another call calls. | 17:53 |
*** rcleere has quit IRC | 17:53 | |
*** reed has joined #openstack-infra | 17:54 | |
zaro | clarkb: are you suggesting that it should not be passed in with the macro? only passed in on direct call at the job level? | 17:55 |
clarkb | zaro: no I am saying the way that variable is used doesn't make it a variable | 17:55 |
clarkb | zaro: the jobs literally get {git-dir} passed to them | 17:55 |
*** rcleere has joined #openstack-infra | 17:55 | |
openstackgerrit | Ben Swartzlander proposed a change to openstack-infra/config: Implements: blueprint add-manila-to-gerritbot https://review.openstack.org/72872 | 17:56 |
zaro | clarkb: ohh you mean it should be ${git-dir}? | 17:56 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/config: Don't run large-ops test on repos that it doesn't touch https://review.openstack.org/78687 | 17:56 |
jogo | sdague jeblair: ^ | 17:56 |
clarkb | zaro: no, you have to pass git-dir somewhere for interpolation to happen | 17:57 |
clarkb | zaro: otherwise it doesn't get replaced and the '.' condition in the script is never used | 17:57 |
jogo | that was a conservative pruning | 17:58 |
zaro | clarkb: so why would it not interpolate in the macro? the macro.yaml contains other variables. | 17:59 |
clarkb | zaro: because you are not passing that as a variable anywhere | 18:00 |
*** reed has quit IRC | 18:01 | |
clarkb | interpolation only happens if git-dir is set somewhere. Otherwise it remains {git-dir} | 18:01 |
*** reed has joined #openstack-infra | 18:01 | |
fungi | SpamapS: i passed the two net-ids we have when running nova boot... default-net=10.0.58.233; tripleo-bm-test=192.168.1.94 | 18:01 |
clarkb | zaro: at least that is what appears to have happened according to the xml diffs | 18:01 |
fungi | SpamapS: those are all rfc-1918 | 18:01 |
SpamapS | | 512e73ca-3ec8-405f-88c2-631eacd2a875 | fungible | ACTIVE | - | Running | default-net=10.0.58.233; tripleo-bm-test=192.168.1.94 | | 18:03 |
SpamapS | fungi: you should have floating ips to attach | 18:03 |
zaro | clarkb: ok. i think i get it now. i'm not sure what the solution would be besides adding git-dir to every project and i don't think that's a viable solution. | 18:03 |
*** dkliban_lunch is now known as dkliban | 18:03 | |
clarkb | zaro: rgith which is why I think we should just use the cd into dir then g-g-p solution that jeblair has suggested | 18:03 |
clarkb | zaro: see my cover comment | 18:03 |
SpamapS | cloud-init boot finished at Thu, 06 Mar 2014 18:00:28 +0000. Up 40.08 seconds | 18:03 |
fungi | SpamapS: oh, okay. i think i don't know how to do that. i'll see if i can figure it out. the nova command line is already pretty seriously scary... http://paste.openstack.org/show/72801/ | 18:03 |
zaro | clarkb: ohh missed the cover. | 18:03 |
zaro | clarkb: ok, read the cover. i agree, will abandon the change. | 18:05 |
*** dhellmann is now known as dhellmann_ | 18:06 | |
SpamapS | fungi: nova floating-ip-associate | 18:06 |
fungi | SpamapS: yeah, i have to floating-ip-create first looks like, according to floating-ip-list | 18:07 |
*** chandan_kumar has joined #openstack-infra | 18:07 | |
SpamapS | fungi: yes | 18:07 |
fungi | SpamapS: just out of curiosity, does tripleo have a development nodepool instance they're testing against that cloud to make sure that it's expected to be working? | 18:08 |
fungi | or is that me? | 18:08 |
*** andre__ has quit IRC | 18:08 | |
*** nati_ueno has joined #openstack-infra | 18:09 | |
jeblair | jogo: cool thanks | 18:09 |
davidlenwell | "Looks like the node went offline during the build. Check the slave log for the details.FATAL" on https://review.openstack.org/#/c/78683/ .. is stuff broken or is it me? | 18:09 |
*** khyati has joined #openstack-infra | 18:09 | |
fungi | SpamapS: and it's apparently add-floating-ip not floating-ip-associate | 18:10 |
jeblair | sdague: i think we can proceed with your patch, especially if we can get people to pitch in on reviewing layout changes until we've fully formulated a new policy | 18:10 |
anteaya | davidlenwell: https://bugs.launchpad.net/openstack-ci/+bug/1284371 | 18:10 |
sdague | jeblair: sounds good | 18:10 |
jogo | jeblair: didn't think the patch would be so big | 18:11 |
davidlenwell | anteaya: so what should I do with it? | 18:11 |
anteaya | davidlenwell: well you can recheck bug 1284371 | 18:12 |
anteaya | you can add a comment to the bug report | 18:12 |
anteaya | you can read the bug report and see if you have any insight into why that is happening | 18:12 |
sdague | jeblair: can't wait for new gerrit with secondary indexes that would make that simpler | 18:12 |
*** Ryan_Lane has joined #openstack-infra | 18:13 | |
sdague | I'll see if the email watch works on it | 18:13 |
jeblair | sdague: oh because of file level watches? | 18:13 |
sdague | yep | 18:13 |
jeblair | sdague: yeah, i think it works in email just not web | 18:13 |
sdague | yeh, it's supposed to, I haven't tried before | 18:13 |
sdague | I set one up now | 18:13 |
sdague | we'll see | 18:13 |
fungi | SpamapS: okay, confirmed i'm able to use novaclient on nodepool.o.o to attach a floating ip to the instance i booted | 18:14 |
fungi | SpamapS: so whatever nodepool's issue is, we probably need more debug logging to sort it out. i'll generate a stacktrace and see if i can tell whether something's hung in some way. no clue whether that will help but i might spot something | 18:15 |
davidlenwell | anteaya: alas .. I do not know why that would happen .. I've added my incident to the comments.. how do I make it recheck ..since im sure the problem wasn't on my end. | 18:16 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add a script to manage IRC perms https://review.openstack.org/78483 | 18:16 |
SpamapS | fungi: thanks for chasing it. We're very intersted in getting CI back up so just ping if you need anything from me. | 18:16 |
anteaya | recheck bug 1284371 | 18:17 |
jeblair | SpamapS, fungi: is there an intentional behavior change (like we are using floating ips now but were not before)? | 18:17 |
fungi | jeblair: none i've been informed of. just "okay cloud's back up now! where all the nodes at?" | 18:18 |
*** hogepodge has joined #openstack-infra | 18:18 | |
jeblair | SpamapS: do you folks have a nodepool pointing at your own cloud to help debug these sorts of things? | 18:18 |
* fungi just asked that too... heh | 18:19 | |
jeblair | oh heh | 18:19 |
jeblair | i mean, when i'm debugging this sort of thing, i just run nodepool on my workstation and point it at rax or hp | 18:19 |
*** rpodolyaka has quit IRC | 18:20 | |
fungi | jeblair: current behavior is that nodepoold has started building nodes (or thinks it has) but has no instance id or ip address in its db for them, and just mentions in its debug log that it's building them but never anything else. confirmed i'm able to nova boot and attach a floating ip and ping an instance from nodepool.o.o using the same credentials and settings listed in the nodepool.yaml | 18:20 |
StevenK | pleia2: diff from 0.9.8-2ubuntu17 (in Ubuntu) to 1.2.2-0ubuntu1~precise1 (28.8 MiB) | 18:20 |
jeblair | fungi: when you take a stacktrace, the thread name will have the node id in it so you can see exactly where it's sitting in the process | 18:21 |
fungi | and nodepool image-update worked fine, built an image, marked it ready, i used that to boot the one i did manually | 18:21 |
jeblair | you probably know that | 18:21 |
fungi | yep, that's what i'm hoping will yield some additional detail | 18:21 |
clarkb | btw I haven't heard anything new from hp land this morning | 18:21 |
clarkb | I will bug people again | 18:21 |
SpamapS | jeblair: no, but it makes sense that we should spin one up so we can make sure nodepool will even work. | 18:21 |
fungi | oh, and nova list doesn't return any of the nodes nodepool thinks it has in a building state either, forgot to mention that | 18:22 |
sdague | jeblair: ok, so I'm going to propose the heat-slow add on top of that one then | 18:22 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: enable heat-slow in the integrated-gate https://review.openstack.org/78698 | 18:22 |
*** jcooley_ has joined #openstack-infra | 18:22 | |
jeblair | sdague: cool! | 18:22 |
clarkb | how slow is heat slow? | 18:23 |
clarkb | or is that just a name? | 18:23 |
openstackgerrit | A change was merged to openstack-infra/config: Add gate-murano-devstack job https://review.openstack.org/75078 | 18:23 |
sdague | it's faster than tempest-full | 18:23 |
sdague | it includes @slow jobs in tempest | 18:23 |
clarkb | nice | 18:23 |
sdague | which we normally exclude | 18:23 |
jeblair | so it's kind of a fast slow | 18:24 |
clarkb | ya so its a different set | 18:24 |
sdague | it's not a lot of jobs right now | 18:24 |
sdague | but it does actually bring up real versions of linux | 18:24 |
sdague | not just cirros | 18:24 |
clarkb | oh interesting | 18:24 |
openstackgerrit | Elizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add back old projects to replicate to git.o.o https://review.openstack.org/78490 | 18:24 |
clarkb | sdague: I wonder if that makes any difference with the ssh bug thing that has been a long standing problem | 18:24 |
clarkb | since cirros uses dropbear | 18:24 |
sdague | clarkb: yeh, that I don't know | 18:24 |
clarkb | it shouldn't matter but who knows | 18:24 |
sdague | I thought the ssh bug was mostly races on network connections | 18:25 |
clarkb | yeah probably | 18:25 |
clarkb | sdague: but if you read the logs the connection is made | 18:25 |
clarkb | anyways smarter people than I are debugging that one | 18:25 |
sdague | dansmith is working on an event callback api that neutron can call which will make that much better | 18:25 |
sdague | because it turns out that we are mostly passing tests because cirros dhcps 5 times | 18:26 |
sdague | before giving up | 18:26 |
dansmith | yep | 18:26 |
*** thedodd has quit IRC | 18:26 | |
*** krotscheck has joined #openstack-infra | 18:26 | |
sdague | and time 5 usually works (times 1, 2, and 3... not so much) | 18:26 |
dansmith | https://review.openstack.org/#/c/78052/ | 18:26 |
*** nicedice has joined #openstack-infra | 18:26 | |
jeblair | i don't understand 78052 | 18:27 |
*** zns has joined #openstack-infra | 18:28 | |
fungi | jeblair: clarkb: it's worth pointing out that we may want to scale back our max instances in ord... we're hitting ram quota limit around 63 nodes and nodepool thinks it's allowed to have 92 in there | 18:28 |
clarkb | nibalizer: help me out with https://review.openstack.org/#/c/76366/2 why does puppetdb do anything on the master? aren't they separate hosts? (trying to grok the interaction that happens there) | 18:28 |
dansmith | jeblair: it's just a stub to get the gate stuff to run with out yet-committed trees across three projects | 18:28 |
*** rpodolyaka has joined #openstack-infra | 18:28 | |
dansmith | s/out/our/ | 18:28 |
pleia2 | SpamapS: https://git.openstack.org/cgit/openstack-infra/nodepool/tree/README.rst | 18:28 |
jeblair | dansmith: i don't think that will work; the gate only runs what zuul decides | 18:28 |
sdague | this is one of those times where cross project zuul testing would be really useful | 18:28 |
dansmith | jeblair: it seems to be working | 18:29 |
dansmith | jeblair: we've fixed several things it caught that we hadn't reproduced yet | 18:29 |
nibalizer | clarkb: that class will configure the puppet master to use puppetdb | 18:29 |
jeblair | sdague: agreed; people have said they would work on that but i haven't seen anything. | 18:29 |
jeblair | dansmith: if that works then we have a really serious problem | 18:29 |
clarkb | nibalizer: and restart the puppet master when puppetdb changes are made? | 18:29 |
nibalizer | so it installs the puppetdb-terminus package, writes storeconfigs = puppetdb, and reports = puppetdb to puppet.conf and writes out an /etc/puppet/puppetdb.conf | 18:29 |
nibalizer | if you modify puppet.conf you gotta bounce puppet | 18:29 |
nibalizer | i think, it might be all smart about that now | 18:30 |
clarkb | nibalizer: gotcha so that change is just adding the bounce | 18:30 |
clarkb | everything else is already in place right? | 18:30 |
*** gyee has quit IRC | 18:30 | |
fungi | clarkb: yeah, the puppetdb log on the puppetdb.o.o server shows it's getting reports of everything puppet agents are doing now | 18:31 |
clarkb | cool /me approves | 18:31 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard: Make token storage configurable https://review.openstack.org/78188 | 18:31 |
fungi | on all our systems | 18:31 |
nibalizer | clarkb: thanks! | 18:31 |
clarkb | fungi: https://review.openstack.org/#/c/76280/ should that be abandoned / WIP until we kill grizzly? | 18:31 |
jeblair | dansmith: indeed, devstack did seem to checkout 74832 in http://logs.openstack.org/52/78052/4/check/check-tempest-dsvm-full/cfe4f7b/logs/devstacklog.txt.gz | 18:32 |
jeblair | dansmith: that's _really_ not supposed to happen | 18:32 |
jeblair | sdague, dtroyer: ^ | 18:32 |
dansmith | jeblair: roger that.. was just trying to find a trace that's only generated in the new code | 18:32 |
fungi | clarkb: yep, done | 18:32 |
dansmith | jeblair: you can credit arosen for finding that hole, I just said "oh, nifty" :D | 18:32 |
dansmith | jeblair: really wish I had --reset-author so I could get the credit :( | 18:33 |
jeblair | dansmith: yeah, i agree it's something we should be able to do; but thet fact that it works puts gating in jeopoardy | 18:33 |
dansmith | jeblair: okie | 18:33 |
lifeless | SpamapS: fungi: is the instance fungible fungi ? :) | 18:33 |
fungi | lifeless: so it seems ;) | 18:33 |
dansmith | jeblair: would really appreciate you not fixing it until we get this stuff in tho :D | 18:33 |
lifeless | fungi: so cloud works :) | 18:33 |
jeblair | dansmith: (basically, zuul needs to be responsible for setting up repos because setting them up the way zuul wants them to be is very complicated) | 18:34 |
openstackgerrit | Victor Stinner proposed a change to openstack/requirements: Block setuptools 3.0 to workaround a cffi bug https://review.openstack.org/78701 | 18:34 |
jeblair | dansmith: well, devstack-gate in this case | 18:34 |
dansmith | jeblair: yeah, I get it.. seems like a devstack patch would be able to break that, but nothing else, but IANAIP | 18:34 |
*** sabari has quit IRC | 18:34 | |
fungi | lifeless: the parts i exercised anyway. at this point i'm hoping the stack trace (once i can find and snip it out of the log full of other noise) will show we're blocking on something where nodepoold got confused at some point while the endpoint was unresponsive | 18:34 |
clarkb | jeblair: does ERROR_ON_CLONE need to be ERROR_ON_GIT instead? | 18:35 |
jeblair | dansmith: yes, that should be the case | 18:35 |
clarkb | hpk hopes to get to pull requests soon. Fingers are crossed we get concrete info on the tox thing. | 18:37 |
*** CaptTofu has joined #openstack-infra | 18:37 | |
*** jcooley_ has quit IRC | 18:37 | |
*** thuc has quit IRC | 18:39 | |
jeblair | dansmith, sdague, dtroyer: the checkout of refs seems to be fairly self-contained and doesn't looks like it indicates a bug that could affect the gate, as long as devstack cores don't actually approve a change that sets the default branches to something with refs | 18:39 |
sdague | jeblair: so it looks like the logic in devstack in git_clone is pattern matching for what is probably a zuul ref | 18:39 |
*** thuc has joined #openstack-infra | 18:39 | |
*** dprince has joined #openstack-infra | 18:39 | |
sdague | jeblair: yeh, I think we're safe there :) | 18:39 |
jeblair | sdague: well, it would be any kind of ref, zuul or (in this case) raw gerrit | 18:40 |
sdague | we could probably check for it in our run_tests.sh just in case | 18:40 |
sdague | right ^ref/ | 18:40 |
sdague | sorry ^refs | 18:40 |
*** bhuvan has joined #openstack-infra | 18:40 | |
*** bhuvan_ has joined #openstack-infra | 18:40 | |
sdague | it is kind of a useful hack of the system for exactly this case though | 18:40 |
jeblair | sdague: yeah, i think it's probably safe to leave the facility in there as long as we are careful (and a test for it would be a good way to do that), at least until we can cross-depend in zuul | 18:41 |
sdague | yep | 18:41 |
*** thuc_ has joined #openstack-infra | 18:42 | |
*** coolsvap has quit IRC | 18:43 | |
*** thuc has quit IRC | 18:43 | |
*** thuc__ has joined #openstack-infra | 18:43 | |
*** thuc__ has quit IRC | 18:44 | |
*** thuc has joined #openstack-infra | 18:44 | |
*** chuck__ has joined #openstack-infra | 18:45 | |
*** thuc_ has quit IRC | 18:47 | |
*** sweston has joined #openstack-infra | 18:47 | |
*** talluri has quit IRC | 18:50 | |
clarkb | sdague: with that sorted, can we fix e-r logging? | 18:50 |
*** rcarrillocruz1 has joined #openstack-infra | 18:51 | |
clarkb | sdague: want to make sure you think a revert is the right way to tackle that before we go doing that | 18:51 |
jogo | clarkb: working on a patch to fix the !logging part | 18:51 |
*** mwagner_lap has quit IRC | 18:51 | |
jogo | almost there | 18:51 |
clarkb | jogo: woot | 18:51 |
jogo | still getting dropped files | 18:51 |
*** talluri has joined #openstack-infra | 18:51 | |
clarkb | jogo: did you track that down to gerritlib? | 18:51 |
jogo | clarkb: worked around it | 18:51 |
clarkb | the TypeError | 18:51 |
sdague | jogo: great, point me to the patch when you get it up | 18:51 |
*** rcarrillocruz has quit IRC | 18:51 | |
jogo | sdague: will do | 18:51 |
fungi | jeblair: lifeless: SpamapS: every one of the node launcher threads for all the currently "building" tripleo nodes looks like http://paste.openstack.org/show/72809/ | 18:52 |
sdague | so - https://review.openstack.org/78701 is setuptools still in our mirror? | 18:52 |
jeblair | fungi: so look for the task manager for tripleo | 18:53 |
jeblair | fungi: they are all waiting for it to complete a task | 18:53 |
fungi | aha | 18:53 |
clarkb | sdague: looking | 18:53 |
sdague | clarkb: thanks | 18:53 |
sdague | because if so, we should purge it or we should put in the requirements block | 18:54 |
sdague | given that it's been pulled from pypi | 18:54 |
jeblair | fungi: possibly it's the old dead connection isn't detected as dead because of lack of keepalive thing | 18:54 |
sdague | also... pycon must be around the corner :) | 18:54 |
clarkb | sdague: that version is not in our mirror | 18:54 |
clarkb | sdague: we should be fine | 18:55 |
fungi | jeblair: yep! | 18:55 |
sdague | clarkb: ok, when did it drop? | 18:55 |
fungi | jeblair: lifeless: SpamapS: http://paste.openstack.org/show/72810/ | 18:55 |
clarkb | sdague: I don't understand the question. I don't think it was ever in our mirror | 18:55 |
*** banix has joined #openstack-infra | 18:55 | |
*** talluri has quit IRC | 18:55 | |
fungi | so basically it still has a socket open from before the cloud went offline (one of the times it went offline anyway, probably the first time since the last nodepoold restart) | 18:55 |
jeblair | fungi: so i've been getting an earful this morning about how unsatisfactory it is that we can't provide test nodes in a timely manner | 18:56 |
*** johnthetubaguy has quit IRC | 18:56 | |
annegentle | anyone know the right meaning of OS_TENANT_NAME for HP Cloud? For my credentials file to use nova CLI? | 18:56 |
annegentle | the project ID isn't wanted apparently | 18:56 |
clarkb | annegentle: one sec | 18:57 |
annegentle | I have a domain id and an account id guess I'll try those and process of elimination | 18:57 |
annegentle | clarkb: oo! | 18:57 |
jeblair | fungi: so i don't want to restart nodepool to fix that. i think it can wait until we need to restart for something else, or this weekend or next week. | 18:57 |
fungi | jeblair: makes sense | 18:57 |
SpamapS | is it not closing dead connections? | 18:57 |
*** mrodden has quit IRC | 18:57 | |
clarkb | annegentle: gah mine is on the other mchine, a longer sec :) | 18:57 |
*** chuck__ has quit IRC | 18:57 | |
annegentle | hee | 18:58 |
fungi | SpamapS: it's not closing _a_ connection. one that is still established the last time it heard | 18:58 |
fungi | SpamapS: if the other end dropped the connection without sending tcp rst or fin, then it's just going to wait forever (or until the next time it's restarted) | 18:58 |
clarkb | annegentle: looks like it should be the project name | 18:58 |
SpamapS | fungi: so perhaps we could craft an RST ... | 18:59 |
jeblair | that would probably do it | 18:59 |
SpamapS | is it possible it _is_ still active on this end? | 19:00 |
*** dims has quit IRC | 19:00 | |
sdague | clarkb: ok, never mind, the largeops fails were the other thign | 19:00 |
annegentle | clarkb: huh. now I get Tenant not accessible. | 19:00 |
anteaya | so for tomorrow's installment of new project fridays we have: https://etherpad.openstack.org/p/new-projects-2014-03-07 thus far | 19:01 |
SpamapS | fungi: could you dig out the tcp connection details? we can try sendip on it. | 19:01 |
*** jcooley_ has joined #openstack-infra | 19:01 | |
*** sabari has joined #openstack-infra | 19:02 | |
*** dims has joined #openstack-infra | 19:02 | |
* SpamapS has used sendip before but never not as a joke.. | 19:02 | |
fungi | SpamapS: http://paste.openstack.org/show/72811/ | 19:02 |
*** hogepodge has quit IRC | 19:02 | |
sdague | man, the fact that nova is broken on vim in cloud archive - http://logs.openstack.org/10/77710/3/gate/gate-tempest-dsvm-large-ops/1608dbf/logs/devstacklog.txt.gz .... | 19:03 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard: Add superuser check https://review.openstack.org/77859 | 19:03 |
clarkb | sdague: is that dims' change to test cloud archive? | 19:04 |
openstackgerrit | A change was merged to openstack-infra/config: Set apache as the puppet service name https://review.openstack.org/76366 | 19:04 |
*** zns has quit IRC | 19:04 | |
fungi | SpamapS: if you're wanting to work on a patch, there's probably a couple of things we should be doing in nodepool for robustness when reusing client/provider connections... find a way to get tcp keepalives going on the socket (to handle cases where the provider endpoint dies silently) and also periodically recycle connections (to handle things like graceful endpoint moves using dns record changes) | 19:05 |
sdague | clarkb: yeh | 19:05 |
*** hogepodge has joined #openstack-infra | 19:05 | |
clarkb | sdague: I was really hoping it would just work :( | 19:06 |
sdague | clarkb: actually, no - https://review.openstack.org/#/c/77710/ | 19:06 |
sdague | maybe they just broke the update stream entirely | 19:06 |
sdague | however, there isn't really any reason we should be explicitly installing vim in devstack | 19:06 |
openstackgerrit | A change was merged to openstack-infra/storyboard: Make token storage configurable https://review.openstack.org/78188 | 19:06 |
*** zns has joined #openstack-infra | 19:06 | |
dansmith | fungi: should I be rechecking patches after that ubuntu mirror outage thing? | 19:07 |
dansmith | fungi: I don't want to generate more load if they're just going to fail | 19:08 |
fungi | dansmith: it looked like it was brief... i only saw it hit a few | 19:08 |
dansmith | fungi: cool, thanks | 19:08 |
*** arosen has joined #openstack-infra | 19:09 | |
fungi | dansmith: oh, it might still be ongoing? i see it affected some nova changes in the gate about 30 minutes ago | 19:10 |
dansmith | fungi: okay | 19:10 |
dansmith | fungi: well, the check queue isn't huge, so maybe it's okay if I do a few? | 19:10 |
fungi | dansmith: can't hurt | 19:11 |
*** skraynev is now known as skraynev_afk | 19:11 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/elastic-recheck: Revert "move to static LOG" https://review.openstack.org/78716 | 19:12 |
clarkb | sdague: ^ | 19:12 |
clarkb | jogo: ^ I suppose you are probably interested in that too | 19:13 |
*** mrodden has joined #openstack-infra | 19:13 | |
*** alexpilotti has joined #openstack-infra | 19:13 | |
openstackgerrit | A change was merged to openstack-infra/storyboard-webclient: Removed errant console statements. https://review.openstack.org/78411 | 19:14 |
sdague | clarkb: so instead of that, can we lazy load it? | 19:14 |
SpamapS | fungi: any chance that one of those connections shut down? | 19:14 |
sdague | sorry, I only just realized what the issue is | 19:14 |
clarkb | sdague: this is lazy loading the singleton objects | 19:14 |
*** alexpilotti has quit IRC | 19:15 | |
jogo | clarkb: so e-r is more broken then I thought :/ | 19:15 |
clarkb | do you want a LOG object with an internal logger that is filled when logger methods are used on LOG? | 19:15 |
*** alexpilotti has joined #openstack-infra | 19:15 | |
fungi | SpamapS: doesn't look like it | 19:15 |
fungi | SpamapS: still all the same port numbers on source and destination | 19:16 |
SpamapS | 19:16:15.563519 IP 192.237.211.91 > 138.35.77.16: ICMP host 192.237.211.91 unreachable - admin prohibited, length 48 | 19:16 |
SpamapS | I don't think it liked my RST | 19:16 |
SpamapS | 19:16:15.524872 IP 138.35.77.16.13774 > 192.237.211.91.60307: Flags [R], seq 819381088, win 65535, length 0 | 19:16 |
*** jcoufal has quit IRC | 19:17 | |
*** jcoufal_ has joined #openstack-infra | 19:17 | |
sdague | clarkb: ok | 19:17 |
sdague | +2 | 19:17 |
SergeyLukjanov | fungi, jeblair, clarkb, mordred, we've just selected the name for savanna, so, I'd like to request the repo renaming this weekend (it'll be really awesome, to have more time before the RC) | 19:18 |
fungi | SpamapS: i can try sending it from the host to itself, or from an adjacent vm if i can find one. what command-line options were you trying? | 19:18 |
anteaya | I just looked at my iternary for my trip, i thought I left tomorrow night but I don't I leave tonight | 19:18 |
SpamapS | fungi: sendip -v -p ipv4 -p tcp -ts 13774 -td 60307 -tfr 1 -tfs 0 -is 138.35.77.16 192.237.211.91 | 19:18 |
fungi | SpamapS: giving it a whirl | 19:19 |
annegentle | clarkb: ah, figured it out, had to activate services on my project | 19:19 |
clarkb | annegentle: oh yeah you ahve to do that for individual things in different regions | 19:19 |
*** sweston has quit IRC | 19:20 | |
fungi | SpamapS: i was trying to do something similar with hping3 the other day (to spoof tcp/rst packets in an attempt to close down gerrit client connections) but wasn't having much luck. never knew about sendip | 19:20 |
annegentle | clarkb: kind of nice for protecting my bill... workshop for sxsw Sunday | 19:20 |
annegentle | clarkb: :) cross cloud workshop | 19:20 |
*** thedodd has joined #openstack-infra | 19:20 | |
clarkb | annegentle: sounds like fun | 19:21 |
NobodyCam | Good morning infra quick question: is setuptools-3.0.2 newly uploaded this morning? | 19:21 |
clarkb | NobodyCam: yes and it should be gone now | 19:21 |
NobodyCam | lol | 19:21 |
fungi | NobodyCam: yes, dstufft gave us a heads up that it deprecated "Feature" in setup.py which some packages like cffi were still using | 19:22 |
fungi | s/deprecated/removed/ (it was already deprecated) | 19:22 |
SpamapS | fungi: I think we actually need the sequence number. | 19:22 |
NobodyCam | ahh ok ... :) thank you :) | 19:22 |
fungi | SpamapS: i agree, and ran out of time to figure out whether i could dig it out of the kernel | 19:22 |
openstackgerrit | Doug Hellmann proposed a change to openstack-infra/config: Add gate jobs for oslo libraries https://review.openstack.org/76945 | 19:22 |
SpamapS | fungi: right thats what I'm trying to determine | 19:23 |
anteaya | so that means I am not here tomorrow for new project friday | 19:23 |
SpamapS | fungi: check /proc/net/ip_conntrack | 19:23 |
*** e0ne has joined #openstack-infra | 19:24 | |
fungi | SpamapS: eureka! | 19:24 |
fungi | tcp 6 298488 ESTABLISHED src=192.237.211.91 dst=138.35.77.16 sport=60307 dport=13774 src=138.35.77.16 dst=192.237.211.91 sport=13774 dport=60307 [ASSURED] mark=0 use=2 | 19:24 |
lifeless | fungi: can you send it a RST ? | 19:25 |
StevenK | fungi: My fault | 19:25 |
fungi | i assume 298488 is the ipseq | 19:25 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/config: Added krotscheck as a user to storyboard.openstack.org https://review.openstack.org/78723 | 19:25 |
lifeless | om over to the sprint | 19:25 |
fungi | lifeless: that's what we're trying to figure out | 19:25 |
SpamapS | fungi: so I think the RST needs to be that +1 | 19:25 |
SpamapS | fungi: sendip -v -p ipv4 -p tcp -ts 13774 -td 60307 -tfr 1 -tfs 0 -tn 298489 -is 138.35.77.16 192.237.211.91 | 19:26 |
SpamapS | I'm still getting ICMP denials from the actual server | 19:26 |
SpamapS | lifeless: you're here or heading here now? | 19:27 |
*** sarob_ has joined #openstack-infra | 19:27 | |
mtreinish | clarkb: I'm too lazy to do a diff, what's the difference between: https://review.openstack.org/#/c/78716/1 and https://review.openstack.org/#/c/78485/1 ? | 19:28 |
*** gyee has joined #openstack-infra | 19:29 | |
fungi | SpamapS: no luck from this end either (locally on the machine or from another host in the same region). iptables might be doing some sort of anti-spoofing to prevent egress of these packets... i'll fire up tcpdump shortly | 19:29 |
clarkb | mtreinish: one passes pep8 and is the result of a revert. The other was my first stab at it, I will abandon the one that isn't a revert | 19:29 |
clarkb | mtreinish: basically better book keeping in one | 19:29 |
*** lcheng has joined #openstack-infra | 19:30 | |
clarkb | mtreinish: I should've used the same change id honestly | 19:30 |
mtreinish | clarkb: heh, ok I was reviewing the revert and thought it looked familiar so I got confused | 19:30 |
jogo | clarkb: I am going to rebase my WIP to fix ER on your patch and we should have a working e-r again | 19:30 |
clarkb | jogo: great | 19:31 |
clarkb | mtreinish: sorry for the confusion | 19:31 |
clarkb | pleia2: speaking of nodepool fedora. https://review.openstack.org/#/c/78440/ is a thing and I couldn't get centos6 nodes running in hpcloud yesterday. I am about to try again to test that change. If that continues to fail any chance you have a node that we can edit /etc/default/grub on then update-grub, reboot, take a snapshot then boot from the snapshot? | 19:33 |
mtreinish | clarkb: no it's good, it keeps me on my toes. | 19:33 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Revert "move to static LOG" https://review.openstack.org/78716 | 19:33 |
*** sweston has joined #openstack-infra | 19:33 | |
pleia2 | clarkb: having a look | 19:34 |
*** dhellmann_ is now known as dhellmann | 19:34 | |
SpamapS | fungi: ty.. we're having debates about whether or not we can just do a very high sequence and whether ack number is important | 19:35 |
greghaynes | What happens when you mention tcp innards to a room of nerds | 19:35 |
*** hogepodge has quit IRC | 19:36 | |
fungi | SpamapS: yeah, it looks like i can't send it remotely (never arrives) but locally on the interface i get several which look like | 19:36 |
fungi | 19:35:05.757062 IP 138.35.77.16.13774 > 192.237.211.91.60307: Flags [R.], seq 2147916301, ack 999999994, win 65535, length 0 | 19:36 |
clarkb | pleia2: ya I still get 2014-03-06 19:36:26,611 - DataSourceEc2.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [49/120s]: url error [[Errno 113] No route to host] I am going to try a west instead of east | 19:36 |
clarkb | also the centos6 image has a 5 second grub timeout | 19:37 |
fungi | SpamapS: oh, actually, o | 19:37 |
* clarkb should find out if we can have images that just boot | 19:37 | |
pleia2 | clarkb: doh | 19:37 |
fungi | SpamapS: actually i'm not receiving the ones i'm sending either. i think it was your attempts i was picking up via tcpdump | 19:37 |
*** apevec has joined #openstack-infra | 19:38 | |
*** dkliban has quit IRC | 19:38 | |
SpamapS | fungi: I've sent a few... | 19:38 |
fungi | SpamapS: yeah, i captured three | 19:39 |
*** e0ne has quit IRC | 19:39 | |
pleia2 | clarkb: I'll give it a try (my vms are on west at the moment anyway) | 19:39 |
fungi | SpamapS: which means they are arriving | 19:39 |
*** e0ne has joined #openstack-infra | 19:39 | |
*** mrodden1 has joined #openstack-infra | 19:40 | |
*** jp_at_hp has quit IRC | 19:40 | |
*** hogepodge has joined #openstack-infra | 19:41 | |
SpamapS | fungi: just sent this one | 19:41 |
SpamapS | 19:41:35.760724 IP 138.35.77.16.13774 > 192.237.211.91.60307: Flags [R], seq 298489, win 65535, length 0 | 19:41 |
fungi | yep, saw it | 19:42 |
SpamapS | fungi: ok.. and when you try to send it, you get the same ICMP denial? | 19:42 |
*** mrodden has quit IRC | 19:42 | |
SpamapS | sendip -v -p ipv4 -p tcp -ts 13774 -td 60307 -tn 298489 -tfr 1 -tfs 0 -is 138.35.77.16 192.237.211.91 | 19:42 |
fungi | SpamapS: i get no reply packet at all, just silence | 19:42 |
*** jcoufal_ has quit IRC | 19:42 | |
SpamapS | fungi: conntrack still showing the same thing? | 19:43 |
tchaypo | pleia2: I don't have a review button on that change | 19:43 |
fungi | SpamapS: seq is 296009 now | 19:43 |
fungi | SpamapS: wait, wrong socket. 297350 | 19:44 |
ttx | fungi: anything wrong with gate right now ? I kinda need 78670,1 to cut I3 | 19:44 |
ttx | and the top looks a bit funny | 19:45 |
pleia2 | clarkb: so the change will be different in centos because /etc/default/grub doesn't exist | 19:46 |
fungi | ttx: 77710 failed a while ago on a rackspace ubuntu mirror issue | 19:46 |
pleia2 | tchaypo: should! I don't know why you wouldn't :( | 19:46 |
clarkb | pleia2: that is what I was afraid of but couldn't verify because EMETADATASERVER | 19:46 |
fungi | ttx: 77941 looks like a nondeterministic bug somewhere raised in a tempest test | 19:46 |
SpamapS | fungi: no that is not sequence | 19:46 |
ttx | fungi: ok so there may still be hope for 78670 | 19:46 |
clarkb | pleia2: is centos6 grub1? | 19:46 |
SpamapS | fungi: that is seconds until the entry is deleted | 19:46 |
fungi | SpamapS: i was about to say the same | 19:46 |
fungi | SpamapS: it's timeout | 19:46 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Unbreak elastic-recheck https://review.openstack.org/78732 | 19:46 |
fungi | SpamapS: what you just said | 19:46 |
*** zhiyan_ is now known as zhiyan | 19:46 | |
jogo | clarkb sdague: doing final round of testing on that ^ | 19:47 |
* fungi is juggling too many irons in separate fires | 19:47 | |
* jogo is waiting for a failure | 19:47 | |
fungi | ttx: i think so, yes | 19:47 |
clarkb | ttx: yup I think there is hope | 19:47 |
jogo | just got a live one | 19:47 |
jogo | hopefully all this ER activity won't crash ES | 19:48 |
jogo | clarkb sdague: it worked | 19:48 |
pleia2 | clarkb: 1:0.97-77.el6 ugh | 19:48 |
jogo | - gate-tempest-dsvm-full: https://bugs.launchpad.net/bugs/1254872 | 19:48 |
clarkb | jogo: it looks happyish. I mean we doubled its size so have room to grow | 19:48 |
jogo | ER lives again | 19:48 |
clarkb | pleia2: woot! | 19:48 |
clarkb | pleia2: I mean :/ | 19:48 |
fungi | SpamapS: i'm going to see if conntrack-tools will get us some relief | 19:48 |
jogo | clarkb: well now ER will actually start DOSing it again | 19:48 |
jogo | once my fix lands | 19:48 |
clarkb | jogo: oh good point | 19:48 |
clarkb | jogo: well whatever | 19:48 |
dstufft | fungi: setuptools 3 got pulled from PyPI a bit ago | 19:49 |
clarkb | pleia2: what about fedora? | 19:49 |
SpamapS | fungi: ok. One thought, the RST only has to be within window-size of the sequence number.. | 19:49 |
clarkb | pleia2: I am guessing that is grub2 and has /etc/default/grub | 19:49 |
SpamapS | fungi: so if window is 65535 we don't have that many windows to try.. | 19:49 |
fungi | SpamapS: very true | 19:49 |
pleia2 | clarkb: spinning up a new fedora now to check | 19:49 |
openstackgerrit | A change was merged to openstack-infra/config: Add notifications to #openstack-oslo channel https://review.openstack.org/76598 | 19:50 |
dhellmann | \o/ | 19:50 |
jeblair | fungi, clarkb: i'm kind of thinking that i don't really want to review channel-level acl changes... so maybe i should rework the accessbot to set perms for global things, and then maybe revoke +F,etc from anyone else, but otherwise leave things be? | 19:52 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/config: Added Authorization Header flag to storyboard module https://review.openstack.org/78734 | 19:52 |
fungi | jeblair: that seems like it would be less work in the long run. sounds good to me | 19:53 |
clarkb | jeblair: wfm | 19:53 |
mordred | jeblair: ++ | 19:54 |
fungi | SpamapS: conntrack was no help. it allowed me to delete the connection tracking entries from the table, but did not actually close out the established sockets associated with them :( | 19:55 |
jeblair | fungi: that might have been anti-help | 19:55 |
jeblair | fungi: you might not be able to get a packet through the firewall now? | 19:55 |
fungi | jeblair: yeah, i can re-add them though | 19:55 |
SpamapS | fungi: and that may now prevent me from actually getting an RST through | 19:55 |
pleia2 | clarkb: heh, of course fedora does not really use /etc/default so much (it has a couple things in it, but it doesn't seem to be the way they do things) | 19:56 |
*** markwash has joined #openstack-infra | 19:56 | |
lifeless | fungi: http://killcx.sourceforge.net/ | 19:56 |
clarkb | pleia2: :) so my change as is is safe for precise but nothing else | 19:56 |
pleia2 | clarkb: seems so | 19:57 |
*** nicedice_ has joined #openstack-infra | 19:57 | |
pleia2 | I'll dig into this and see what we need to do in rh | 19:57 |
*** wchrisj_ has joined #openstack-infra | 19:57 | |
clarkb | I think I managed to get a centos host in west | 19:57 |
jeblair | clarkb, pleia2: does it break fedora/centos or just not work there? | 19:57 |
fungi | lifeless: no help on a dead socket "...one needs to sniff the connection and extract the magic Acknowlegment and Sequence numbers from a TCP packet..." | 19:57 |
clarkb | jeblair: it would fail the image build | 19:57 |
jeblair | k | 19:57 |
clarkb | jeblair: because the file and update-grub aren't a thing on centos | 19:57 |
lifeless | fungi: would it hurt to try? | 19:58 |
fungi | lifeless: um | 19:58 |
fungi | lifeless: i don't have those pieces of data, and have no way to obtain them. try what exactly? | 19:58 |
lifeless | fungi: it sends a syn to the socket | 19:58 |
lifeless | fungi: and uses that to figure out what to send to kill it | 19:58 |
fungi | lifeless: i get that. just guess something? | 19:58 |
*** dstanek_afk has joined #openstack-infra | 19:59 | |
lifeless | fungi: | 19:59 |
lifeless | Killcx works by creating a fake SYN packet with a bogus SeqNum, spoofing the remote client IP/port and sending it to the server. It will fork a child process that will capture the server response, extract the 2 magic values from the ACK packet and use them to send a spoofed RST packet. The connection will then be closed. | 19:59 |
fungi | it sounds like that part only actually works with windows, suggesting linux doesn't respond to its probe | 19:59 |
lifeless | fungi: it describes how it works on linux | 19:59 |
*** dstanek has quit IRC | 20:00 | |
*** dstanek_afk is now known as dstanek | 20:00 | |
fungi | oh, i see. their instructions sounded like i needed to get those values | 20:00 |
jeblair | fungi: i thought that too | 20:00 |
fungi | worth a try, but first i probably need to restore the connection tracking entries | 20:00 |
*** openstackgerrit has quit IRC | 20:00 | |
*** openstackgerrit_ has joined #openstack-infra | 20:00 | |
SpamapS | fungi: would you mind 2.9 million RST's sent at that machine from ours? | 20:00 |
*** openstackgerrit_ is now known as openstackgerrit | 20:01 | |
SpamapS | that would hit every possible window | 20:01 |
*** wchrisj__ has joined #openstack-infra | 20:02 | |
*** echohead has joined #openstack-infra | 20:03 | |
*** wchrisj_ has quit IRC | 20:03 | |
*** nicedice has quit IRC | 20:03 | |
*** wchrisj has quit IRC | 20:03 | |
*** harlowja has quit IRC | 20:03 | |
*** mrodden1 has quit IRC | 20:03 | |
*** echohead_ has quit IRC | 20:03 | |
*** mrodden has joined #openstack-infra | 20:03 | |
*** mrodden has quit IRC | 20:04 | |
*** mrodden has joined #openstack-infra | 20:04 | |
*** harlowja has joined #openstack-infra | 20:04 | |
*** smarcet has joined #openstack-infra | 20:04 | |
lifeless | SpamapS: IF we're not facing NAT | 20:04 |
mordred | jeblair: I believe we're ready for an ssl cert for storyboard.openstack.org - you normally get those, yeah? | 20:04 |
jeblair | mordred: https://review.openstack.org/#/c/76407/ | 20:04 |
mordred | jeblair: wow. you're amazing | 20:04 |
SpamapS | lifeless: facing NAT? | 20:04 |
SpamapS | lifeless: there's no NAT. | 20:04 |
krotscheck | Neat! | 20:05 |
lifeless | SpamapS: on the nodepool server? | 20:06 |
*** rlandy|bbl is now known as rlandy | 20:06 | |
SpamapS | lifeless: the TCP connection I was shown shows the real IP's on both sides. | 20:06 |
fungi | lifeless: SpamapS: okay, the conntrack entries for those two sockets are restored | 20:07 |
*** vkozhukalov has quit IRC | 20:08 | |
krotscheck | jeblair: Could you fix storyboard for us? https://review.openstack.org/#/c/78734/ | 20:09 |
SpamapS | fungi: I'll try with a big window size first | 20:09 |
*** dstanek is now known as dstanek_afk | 20:09 | |
lifeless | SpamapS: ok what are the conn details ? | 20:11 |
*** sweston has quit IRC | 20:11 | |
*** eharney has quit IRC | 20:12 | |
Ng | do you guys know about cutter? | 20:13 |
*** malini is now known as malini_afk | 20:13 | |
*** mrodden1 has joined #openstack-infra | 20:14 | |
openstackgerrit | A change was merged to openstack-infra/config: Added Authorization Header flag to storyboard module https://review.openstack.org/78734 | 20:14 |
*** mrodden has quit IRC | 20:15 | |
lifeless | fungi: whats the ip and port # ? | 20:15 |
*** ihrachys is now known as ihrachys|afk | 20:15 | |
SpamapS | fungi: anything? | 20:16 |
fungi | lifeless: http://paste.openstack.org/show/72811/ | 20:16 |
lifeless | fungi: thanks | 20:17 |
SpamapS | i just hit all 65535 byte windows | 20:17 |
fungi | SpamapS: no change... still there | 20:17 |
lifeless | fungi: tried killcx ? | 20:17 |
fungi | i'm about set with the requirements for that lifeless, yes | 20:17 |
lifeless | fungi: cool. fingers crossed. | 20:17 |
fungi | still working on it | 20:17 |
*** mriedem has quit IRC | 20:19 | |
fungi | lifeless: SpamapS: that seems to have done it--good find! | 20:20 |
lifeless | tada | 20:20 |
fungi | short little program too... didn't take too long to audit | 20:20 |
StevenK | fungi: Can you paste the output, out of interest? | 20:21 |
SpamapS | fungi: killcx got it? | 20:21 |
fungi | StevenK: SpamapS: yep! http://paste.openstack.org/show/72814/ | 20:23 |
krotscheck | jeblair: Thanks! | 20:23 |
*** ociuhandu has quit IRC | 20:24 | |
fungi | SpamapS: lifeless: StevenK: it's not packaged in precise, but worked with libnet-rawip-perl libnet-pcap-perl and libnetpacket-perl on precise | 20:24 |
*** mriedem has joined #openstack-infra | 20:24 | |
fungi | that's a useful one to keep up the sleeve for future use. something i've occasionally wanted to be able to do on linux and never found a good tool for | 20:25 |
StevenK | It's not packaged at all, as far as I can tell | 20:25 |
fungi | pro'lly not | 20:25 |
*** rlandy has quit IRC | 20:26 | |
StevenK | Perl modules are incredibly easy to package up, so I may just accidently do it. | 20:27 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Make storyboard run over ssl https://review.openstack.org/78747 | 20:27 |
pleia2 | clarkb: so fedora cloud images don't actually have grub2 packages, they have a grubby package which is used to update the grub.cfg which it seems to use, but I can't get it booting in a way that respects the mem=8G | 20:28 |
fungi | StevenK: it's useful enough i'd get it into sid if i had time, but you know... time | 20:28 |
pleia2 | clarkb: will dig more after lunch | 20:28 |
clarkb | pleia2: thank you | 20:29 |
clarkb | pleia2: I think I mostly have centos sorted | 20:29 |
*** jcoufal has joined #openstack-infra | 20:29 | |
*** eharney has joined #openstack-infra | 20:30 | |
*** zns has quit IRC | 20:32 | |
jomara | fungi: quick git question for you, i have a question pertaining to my 'situation' yesterday - of the 5 patches, #1 was merged, #5 was decoupled, and #2 was found to have a spelling error i need to fix. am i save to just edit #2 & git-review that, or should i edit #2, then cherry pick #3 & #4, and git-review that? | 20:32 |
jomara | s/save/safe | 20:32 |
*** zns has joined #openstack-infra | 20:34 | |
jomara | im a little gunshy after yesterday :) | 20:34 |
krotscheck | Monty wants me to share this: https://www.destroyallsoftware.com/talks/wat | 20:34 |
mordred | fungi, jeblair: ^^^ | 20:34 |
fungi | jomara: in your old topic branch, rebase -i on the sha for #1, change #2 from pick to edit, fix your spelling error, git commit -a --amend, git rebase --continue, git review | 20:34 |
jomara | ok, got it | 20:35 |
jomara | that is the same thing you wanted me to do yesterday (which i ended up aborting, because i had to decouple the 5th patch instead of preserving it) | 20:35 |
jomara | thanks | 20:36 |
fungi | jomara: if #5 is decoupled, then assuming your old topic branch still had it depending on the others, you'll want to git reset --hard to patch #4 before starting the rebase (and git stash first if you had anything else being edited in there you hadn't committed yet) | 20:38 |
*** dangers is now known as dangers_away | 20:38 | |
jomara | fungi: my old topic branch ends in #4, so i should be ok | 20:38 |
jomara | also now that ive started this it makes perfect sense, thanks | 20:38 |
fungi | oh, perfect | 20:38 |
* clarkb is going to shave 5 seconds off of our node boots | 20:38 | |
clarkb | that is a lot of seconds at 20k servers a day | 20:38 |
fungi | jomara: also you can 'git review -d NNNNN' the revire number of change #4 if you need a fresh topic branch for it and its dependencies | 20:39 |
*** rossella_s has quit IRC | 20:40 | |
*** rpodolyaka has left #openstack-infra | 20:40 | |
*** rlandy has joined #openstack-infra | 20:41 | |
*** yolanda_ has quit IRC | 20:42 | |
*** rossella_s has joined #openstack-infra | 20:42 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Limit nodepool nodes to 8GB of RAM https://review.openstack.org/78440 | 20:42 |
clarkb | pleia2: ^ that covers ubuntu and CentOS | 20:42 |
clarkb | pleia2: left a blank spot in there for Fedora, feel free to push a patchset that addresses Fedora (and add yourself as a co author in the commit) | 20:43 |
*** jswarren has quit IRC | 20:43 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Limit nodepool nodes to 8GB of RAM https://review.openstack.org/78440 | 20:47 |
clarkb | pleia2: ^ that will actually edit grub.conf | 20:47 |
*** sweston has joined #openstack-infra | 20:49 | |
fungi | to answer SergeyLukjanov's earlier question, i'm open to rename savanna projects this weekend if he's got the config change for that drafted up. anyone else going to be around? if so, we should also give DinaBelova and zhiwei a heads up in case they want their projects renamed at the same time | 20:49 |
clarkb | fungi: I will be around | 20:49 |
clarkb | so lets do that | 20:50 |
*** Ryan_Lane has quit IRC | 20:50 | |
SergeyLukjanov | fungi, clarkb, thank you, I need to clarify that we're ready to do it this weekend, so, will be ready to ack it tomorrow | 20:52 |
*** bhuvan_ has quit IRC | 20:53 | |
*** bhuvan has quit IRC | 20:53 | |
*** ildikov_ has joined #openstack-infra | 20:54 | |
*** sandywalsh has quit IRC | 20:55 | |
*** eharney has quit IRC | 20:56 | |
*** e0ne has quit IRC | 20:56 | |
clarkb | jogo: I hav eapproved your e-r fixes. I will keep an eye on it | 20:56 |
fungi | SpamapS: lifeless: i temporarily bumped the min-ready for tripleo-precise nodes on each jenkins master by one, so that they'll all get your jobs registered again, however nova list takes a crazy log time to return and shows a bunch of instances in an error state. is something still wrong there? | 20:57 |
fungi | s/log/long/ | 20:57 |
*** e0ne has joined #openstack-infra | 20:57 | |
*** krotscheck has quit IRC | 20:58 | |
*** bhuvan has joined #openstack-infra | 20:59 | |
*** bhuvan_ has joined #openstack-infra | 20:59 | |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Unbreak elastic-recheck https://review.openstack.org/78732 | 20:59 |
*** SumitNaiksatam has quit IRC | 21:00 | |
*** SumitNaiksatam has joined #openstack-infra | 21:00 | |
clarkb | jeblair: fungi mordred SergeyLukjanov for stuff like https://review.openstack.org/#/c/77376/ do we want to discuss that more than two +2s and a +A? I feel slightly bad approving something like that without more discussion/agreement | 21:01 |
*** krotscheck has joined #openstack-infra | 21:01 | |
mordred | clarkb: I agree with more discussion, although I also agree with the patch | 21:02 |
jeblair | ++ | 21:02 |
SergeyLukjanov | clarkb, agree with extended discussion | 21:02 |
fungi | it probably should go on the -dev ml in a dedicated thread with [3rd-party ci] subject tag or something | 21:03 |
clarkb | great. fungi we await your vote (no rush) | 21:03 |
fungi | because input from the people we're imposing this new requirement on is at least some of what we need, as well as those impacted by not imposing the requirement | 21:03 |
clarkb | fungi: ++ want to suggest that on the change? | 21:04 |
*** sweston has quit IRC | 21:04 | |
jogo | clarkb: thanks, I went to lunch so glad someone was tracking it | 21:04 |
clarkb | jogo: the service should restart here shortly I Think | 21:04 |
jogo | wendar: https://review.openstack.org/78732 BTW | 21:05 |
clarkb | if it doesn't I will go digging | 21:05 |
*** bhuvan_ has quit IRC | 21:05 | |
*** bhuvan has quit IRC | 21:05 | |
wendar | jogo: yeah, I was just looking it over, looks good! | 21:05 |
*** ociuhandu has joined #openstack-infra | 21:05 | |
*** andreaf has joined #openstack-infra | 21:06 | |
*** dzimine has joined #openstack-infra | 21:06 | |
lifeless | fungi: neutron is fail | 21:06 |
lifeless | fungi: it will come good eventually | 21:06 |
fungi | lifeless: funzies | 21:06 |
lifeless | jogo: ^ | 21:06 |
dzimine | folks, we have a problem on stackforge with Mistral | 21:07 |
dzimine | a few commits that are failing with the exact same error: | 21:07 |
dzimine | https://review.openstack.org/#/c/78228/ | 21:07 |
dzimine | https://review.openstack.org/#/c/78599/ | 21:07 |
dzimine | https://review.openstack.org/#/c/77126/ (note that this on is failing at “verify” phase) | 21:07 |
dzimine | We looked at the logs and looks like it's CI itself: at least we can't find a problem on our side. | 21:08 |
dzimine | Any help? Thanks in advance!! | 21:08 |
fungi | dzimine: "ImportError: cannot import name Feature" | 21:08 |
fungi | dzimine: new setuptools was released today which removed "Feature" for setup.py | 21:08 |
fungi | dzimine: looks like MarkupSafe expects it | 21:08 |
*** aysyanne has quit IRC | 21:08 | |
*** Ryan_Lane has joined #openstack-infra | 21:08 | |
mordred | I agree with fungi | 21:09 |
fungi | dzimine: you should probably convince the MarkupSafe developers to fix that | 21:09 |
*** andreaf has quit IRC | 21:09 | |
mordred | fungi: wow - really - the setuptools guys released a breaking change? | 21:09 |
mordred | that just removes something? | 21:09 |
*** mrodden1 has quit IRC | 21:09 | |
fungi | mordred: they deprecated "Feature" in 2.x and removed it in 3.x | 21:09 |
*** mriedem1 has joined #openstack-infra | 21:09 | |
*** mriedem has quit IRC | 21:09 | |
fungi | today | 21:09 |
*** andreaf has joined #openstack-infra | 21:09 | |
*** sweston has joined #openstack-infra | 21:09 | |
lifeless | mordred: yes, see dstufft's note yesterday | 21:10 |
*** mriedem has joined #openstack-infra | 21:10 | |
fungi | though admittedly most of the time deprecation warnings are more or less silent, so some packages continued using it unaware they were eventually going to break | 21:10 |
fungi | cffi was also known to be affected, and they were working on a fix as of today | 21:11 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard: Fixed name resolution in OAuth token https://review.openstack.org/78764 | 21:12 |
pleia2 | clarkb: nice, centos looks good, but did you mean to set 8G rather than 2G? (I can fix this in fedora revision if so) | 21:12 |
*** lcostantino has quit IRC | 21:13 | |
*** mrodden has joined #openstack-infra | 21:13 | |
clarkb | pleia2: I did, please do change it to 8G | 21:14 |
*** blamar has joined #openstack-infra | 21:14 | |
clarkb | I was testing on an 8G node so used 2G to see it change | 21:14 |
*** mriedem1 has quit IRC | 21:14 | |
dzimine | fungu: any suggestion how to temp-fix it? other than wait for MarkupSafe :) | 21:14 |
clarkb | dstufft: you should be good now I thought they pulled the release | 21:15 |
dzimine | ok, I'll try out. Thanks! | 21:15 |
*** CaptTofu has quit IRC | 21:15 | |
*** lcheng has quit IRC | 21:16 | |
*** jswarren has joined #openstack-infra | 21:16 | |
clarkb | https://pypi.python.org/pypi/setuptools/ they did, 3.0 s gone | 21:16 |
fungi | clarkb: ahh, yep, so they did | 21:16 |
fungi | simple index is missing it now too | 21:16 |
fungi | hopefully someone will file a bug on MarkupSafe anyway so they'll not break on the next release which ends that deprecation | 21:17 |
pleia2 | clarkb: so the centos fix *should* work with fedora, it adds it to the grub.conf fine, but something is weird about fedora, when I reboot it's still got 16G ram :\ | 21:18 |
clarkb | pleia2: the image you are using may not include the bootloader in it or some such | 21:19 |
clarkb | pleia2: the non pvhvm rax images have this problem too | 21:19 |
clarkb | pleia2: but as long as it doesn't completely fall over and break I think we decided to go with it | 21:19 |
clarkb | fungi: ^ you seemed knowledgeable about that stuff | 21:20 |
pleia2 | clarkb: yeah, it doesn't come with grub2, just comes with "grubby" which is a tool that can update the config | 21:20 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Fix nesting for required files https://review.openstack.org/78770 | 21:20 |
pleia2 | and the config does exist | 21:20 |
clarkb | I am admittedly in the dark a bit when it comes to the various ways you can boot via kvm | 21:20 |
* SpamapS looks into why novaclient connections don't have a timeout | 21:20 | |
clarkb | SpamapS: ++ | 21:20 |
* SpamapS finds that it is because none of urllib3, requests, or novaclient, set one. | 21:20 | |
SpamapS | You must, as a novaclient user, set a timeout, currently. | 21:21 |
SpamapS | which seems.. sily | 21:21 |
SpamapS | silly even | 21:21 |
*** rpodolyaka has joined #openstack-infra | 21:21 | |
fungi | clarkb: the issue being that some virtualization implementations boot with an external bootloader (some even with an external kernel too) so if your second-stage bootloader isn't being run from within the image or the external bootloader isn't looking within the image for its configuration then you can end up having no guest-level control over kernel command line parameters | 21:22 |
*** ivanand has joined #openstack-infra | 21:22 | |
clarkb | fungi: gotcha | 21:22 |
fungi | clarkb: though in this case, i expect our providers are launching a secondary bootloader from within the image, since that's the most flexible way to support lots of different platforms | 21:23 |
pleia2 | so, it doesn't hurt to do the centos thing to fedora, it just doesn't work | 21:23 |
fungi | rackspace may, however, document what their bootloader sequence and makeup looks like | 21:24 |
*** bhuvan has joined #openstack-infra | 21:24 | |
*** bhuvan_ has joined #openstack-infra | 21:24 | |
fungi | if so and if someone can find that information, then we don't have to guess | 21:24 |
clarkb | pleia2: in that case I think we collapse the two elifs into one | 21:24 |
clarkb | elfi [centos] || [fedora] | 21:25 |
pleia2 | clarkb: and just deal with fedora images ending up with too much ram? | 21:25 |
annegentle | error: src refspec 0.9 matches more than one. | 21:25 |
annegentle | I'm trying to do a tagged release of the openstack-doc-tools repo, I've done it a few times now. But for 0.9 I'm getting " | 21:25 |
annegentle | that | 21:25 |
annegentle | I don't see a 0.9 release on the tags | 21:25 |
*** dzimine has quit IRC | 21:25 | |
clarkb | pleia2: yeah | 21:25 |
pleia2 | clarkb: wfm | 21:25 |
clarkb | pleia2: we can't boot them in hpcloud currently which is where we need the restriction | 21:25 |
pleia2 | yeah | 21:26 |
annegentle | http://git.openstack.org/cgit/openstack/openstack-doc-tools/refs/ | 21:26 |
clarkb | pleia2: but "handling" fedora keeps the nodepool scripts simple | 21:26 |
* pleia2 nods | 21:26 | |
clarkb | annegentle: what does `git tag` locally say? | 21:26 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Make storyboard run over ssl https://review.openstack.org/78747 | 21:26 |
annegentle | clarkb: oh it's there, locally, 0.9. Hm | 21:26 |
pleia2 | clarkb: you want to take care of this in the review? also commit message typo: nodepoll | 21:27 |
clarkb | pleia2: sure and thanks | 21:27 |
pleia2 | sure thing, thanks | 21:27 |
*** dzimine has joined #openstack-infra | 21:27 | |
dstufft | mordred: Note, the thing that was pulled was deprecated in verison 1.0 | 21:27 |
annegentle | clarkb: looks like I can git tag -d 0.9 | 21:28 |
annegentle | then start over? | 21:28 |
dstufft | I don't think I agree with the decision to pull it | 21:28 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Limit nodepool nodes to 8GB of RAM https://review.openstack.org/78440 | 21:28 |
clarkb | pleia2: ^ | 21:28 |
dstufft | but it had some warning at least :/ | 21:28 |
clarkb | dstufft: to be fair version 1.0 is less than a year old right? | 21:28 |
*** thuc has quit IRC | 21:28 | |
clarkb | didn't it happen late summer earlyfall | 21:28 |
clarkb | annegentle: yes that should be fine | 21:29 |
annegentle | clarkb: thanks | 21:29 |
*** thuc has joined #openstack-infra | 21:29 | |
dstufft | clarkb: auguest 2013 yea | 21:29 |
*** ivanand has quit IRC | 21:29 | |
openstackgerrit | A change was merged to openstack-infra/storyboard: Fixed name resolution in OAuth token https://review.openstack.org/78764 | 21:29 |
annegentle | clarkb: hm still seeing it | 21:30 |
*** bhuvan_ has quit IRC | 21:30 | |
*** bhuvan has quit IRC | 21:30 | |
mordred | dstufft: I just think if setuptools is going to start aggressively doing stuff - they should replace easy_install with pip | 21:31 |
*** jcoufal_ has joined #openstack-infra | 21:31 | |
mordred | since easy_install is ACTUALLY broken | 21:31 |
clarkb | annegentle: it shows up in `git tag` after deleting it? | 21:31 |
*** jcoufal has quit IRC | 21:31 | |
mordred | and setuptools.Feature is only probably weird and doesn't really hurt many people | 21:31 |
dstufft | mordred: it actually broke things for some popular projects | 21:31 |
clarkb | pleia2: wow I fail | 21:31 |
dstufft | setuptools.Feature that is | 21:31 |
mordred | dstufft: the existence? or the removal? | 21:32 |
dstufft | SQLAlchemy, cffi, Markupsafe | 21:32 |
dstufft | the removal | 21:32 |
*** dzimine has quit IRC | 21:32 | |
dstufft | they were using it | 21:32 |
mordred | yah -I'm saying, if they're willing to remove that ... | 21:32 |
dstufft | mostly for optional C extensions | 21:32 |
mordred | perhaps let's replace easy_install instead | 21:32 |
mordred | because, you know, ponies and unicorns | 21:32 |
pleia2 | clarkb: it's ok, you're just helping me get my review status up | 21:32 |
pleia2 | stats | 21:32 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Limit nodepool nodes to 8GB of RAM https://review.openstack.org/78440 | 21:32 |
clarkb | I swear those comments registered but then totally forgot | 21:33 |
*** thuc has quit IRC | 21:33 | |
*** thuc has joined #openstack-infra | 21:34 | |
*** jhesketh_ has joined #openstack-infra | 21:34 | |
*** thuc has quit IRC | 21:34 | |
jhesketh_ | Morning | 21:34 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Fix nesting for required files https://review.openstack.org/78770 | 21:35 |
*** thuc has joined #openstack-infra | 21:35 | |
jeblair | jhesketh_: good morning | 21:36 |
*** madmike has quit IRC | 21:36 | |
jeblair | jhesketh_: i'm sorry i'm behind on your zuul changes; i hope to catch up next week | 21:36 |
jhesketh_ | jeblair: not at all, no need to apologise | 21:37 |
pleia2 | clarkb: so my final concern here is that prepare_node is the common script (tripleo uses it too), and we may not want to limit tripleo to 8G always | 21:37 |
jhesketh_ | there's much more important things :-) | 21:37 |
openstackgerrit | A change was merged to openstack-infra/meetbot: manual: fix typo https://review.openstack.org/78130 | 21:37 |
*** dzimine has joined #openstack-infra | 21:37 | |
jeblair | jhesketh_: btw, there was a bit more brainstorming this morning about the replication check in zuul; you might want to scan scrollback for it | 21:37 |
*** jcoufal_ has quit IRC | 21:37 | |
pleia2 | clarkb: can we move this to devstack and bare scripts? | 21:37 |
jeblair | jhesketh_: we don't have an answer yet, just some more thoughts | 21:37 |
clarkb | pleia2: I think it is ok to limit the tripleo slave node | 21:38 |
clarkb | pleia2: as the actual test envs are behind that node | 21:38 |
pleia2 | clarkb: lifeless isn't thrilled with this idea | 21:38 |
jhesketh_ | jeblair: okay, will do | 21:38 |
clarkb | we aren't limiting the tests in any way, just the proxy node | 21:38 |
pleia2 | yeah, aware | 21:38 |
jeblair | pleia2: not okay limiting the 8g node to 8g? | 21:38 |
clarkb | right that too | 21:39 |
*** julim has quit IRC | 21:39 | |
pleia2 | jeblair: well they're 8G now :) | 21:39 |
lifeless | jeblair: can't we just put it in prepare_devstack.sh ? | 21:39 |
jeblair | i don't think it's a bad idea for the unit test nodes either | 21:39 |
jeblair | so if not everywhere, then where pleia2 suggested (bare and devstack) | 21:40 |
lifeless | jeblair: I'm just concerned that if we want more memory for cache on the slaves, that we don't want to perturbate devstack at that time | 21:40 |
*** dzimine has quit IRC | 21:40 | |
clarkb | what cache? | 21:41 |
clarkb | these nodes are used to proxy gearman right? | 21:41 |
clarkb | are we using tmpfs on them for something? | 21:41 |
mordred | so - I thnk if a node wants to have more memory, then it's a different type of node and should potentially have a different node definition | 21:41 |
*** denis_makogon has joined #openstack-infra | 21:44 | |
clarkb | sdague: https://review.openstack.org/#/c/76945/7/modules/openstack_project/files/zuul/layout.yaml do you need to reconcile that with the thing you are doing? | 21:45 |
*** atiwari has quit IRC | 21:46 | |
*** yamahata has quit IRC | 21:47 | |
*** yamahata has joined #openstack-infra | 21:47 | |
zaro | i got a recent request to make another release of jjb. Last release was in Nov 2013. seems like a good time for another release. any infra core want to cut one? | 21:48 |
sdague | clarkb: yeh | 21:48 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Lower rax-ord max-servers in nodepool to 56 https://review.openstack.org/78780 | 21:48 |
pleia2 | clarkb: disk cache for building all the images on the nodepool node | 21:48 |
jeblair | zaro: can it wait until next week? | 21:49 |
clarkb | pleia2: that is a tmpfs? | 21:49 |
sdague | clarkb: honestly, we should layer that on top of the one I've got | 21:49 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Make storyboard run over ssl https://review.openstack.org/78747 | 21:49 |
jeblair | zaro: (i'd like to avoid making anyone's life more difficult this week if possible) | 21:50 |
pleia2 | clarkb: er, no | 21:50 |
clarkb | pleia2: then why does the ram matter? | 21:50 |
*** dkliban has joined #openstack-infra | 21:51 | |
SpamapS | ok so we've got 45 active nodes in the tripleo CI cloud.. but still getting "NOT_REGISTERED" for checks | 21:52 |
*** jcoufal has joined #openstack-infra | 21:52 | |
*** lcheng has joined #openstack-infra | 21:52 | |
lifeless | clarkb: pleia2: hi; I'm here | 21:52 |
*** blamar has quit IRC | 21:52 | |
lifeless | I think there is some confusion - I know I'm confused :) | 21:53 |
SpamapS | fungi: ^^ ? | 21:53 |
*** blamar has joined #openstack-infra | 21:53 | |
clarkb | lifeless: we want our slave nodes to boot with 8GB of memory | 21:53 |
jeblair | SpamapS: OverLimit: Quota exceeded for instances: Requested 1, but already used 100 of 100 instances (HTTP 413) (Request-ID: req-88fd93b8-dd2d-4665-a385-416daa8d157c) | 21:53 |
jeblair | fungi: ^ | 21:53 |
clarkb | so that we don't let code slip through that requires larger nodes than that | 21:53 |
SpamapS | wonderful | 21:53 |
fungi | jeblair: SpamapS: yep | 21:53 |
fungi | SpamapS: lifeless: remember i said i saw a ton of instances in "error" state? | 21:54 |
clarkb | lifeless: it is an artificial limit on RAM so that we can boot flavors with more CPU or disk or anything that isn't RAM | 21:54 |
SpamapS | yeah | 21:54 |
SpamapS | fungi: 52 | 21:54 |
SpamapS | neutron is choking because of db queue pool things | 21:54 |
SpamapS | but some things are working | 21:54 |
mordred | lifeless: which is a workaround for an HP Cloud 1.1 issue that is being worked but will be the way things are fora while | 21:54 |
fungi | SpamapS: we're not getting any all the way to a ready state | 21:54 |
SpamapS | quotas are a bit inaccurate... 95 instances total including the errors. | 21:55 |
SpamapS | fungi: ahh | 21:55 |
*** thuc has quit IRC | 21:56 | |
*** thuc has joined #openstack-infra | 21:56 | |
zaro | jeblair: no hurry. just throwing it out there. | 21:56 |
jeblair | zaro: ok, cool. | 21:57 |
lifeless | clarkb: ok. So I want to be able to give the tripleo-gate slaves more RAM in future without accidentally breaking devstack-gate | 21:58 |
clarkb | lifeless: thats fair, but all I am asking is why? | 21:58 |
lifeless | clarkb: because, we build disk images in those slaves, and thats too slow today, and we haven't analyzed why its slow yet. | 21:59 |
clarkb | what runs on those machines that needs more than 8GB of ram? | 21:59 |
fungi | SpamapS: so currently, nodepool isn't aware of any existing nodes in that provider... any it attempts to build immediately meet with over quota errors, and so it deletes them and tries again | 21:59 |
openstackgerrit | A change was merged to openstack-infra/config: Fixed update of env var in manila's job https://review.openstack.org/76969 | 21:59 |
lifeless | clarkb: we build 4-5 disk images, which means LOTS of IO so we want lots of page cache. | 21:59 |
SpamapS | fungi: quotas fixed | 21:59 |
annegentle | clarkb: sorry had to step away. So I removed 0.9 tag, then added it back, then on the git push gerrit 0.9 I'm getting the error again | 21:59 |
lifeless | clarkb: plus we use a tmpfs to store the tranient image | 21:59 |
lifeless | clarkb: upping them to 16G is one of the very first items on my 'and now we try optimising things' list | 21:59 |
annegentle | clarkb: and I don't see a 0.9 tag on the remote | 22:00 |
*** zhiyan is now known as zhiyan_ | 22:00 | |
pleia2 | clarkb: ah, so we do use a tmpfs | 22:00 |
pleia2 | sorry :) | 22:00 |
fungi | SpamapS: i see 52 stably building now. crossing fingers again | 22:01 |
* fungi needs to go make some dinner... bbiaw | 22:01 | |
SpamapS | fungi: thanks | 22:01 |
*** zns has quit IRC | 22:02 | |
*** pdmars has quit IRC | 22:03 | |
clarkb | annegentle: can you paste the output? | 22:03 |
clarkb | lifeless: gotcha, but wouldn't you need to change the nodepool configs anyways? | 22:03 |
clarkb | I see this as making things more ocmplicated for a noop | 22:04 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add fingerprint for swift bug 1288918 https://review.openstack.org/78788 | 22:04 |
clarkb | (and the nodepool scripts are already unfortunately complicated) | 22:04 |
jeblair | clarkb: i think he's saying that he would immediately need to move the 8g knob out of the tripleo path because he needs it not to be there | 22:04 |
jeblair | in order to do the optimization testing he was planning | 22:05 |
*** dzimine has joined #openstack-infra | 22:05 | |
fungi | SpamapS: you have ready nodes | 22:06 |
*** zns has joined #openstack-infra | 22:06 | |
annegentle | clarkb: sure here's more than you want probably https://gist.github.com/anonymous/adec7c871aed75594e70 :) | 22:07 |
*** lcestari_ has quit IRC | 22:07 | |
SpamapS | fungi: is nodepool failing to delete the 52 that are in error? | 22:07 |
fungi | SpamapS: probably, since it didn't/doesn't seem to know they exist | 22:07 |
*** atiwari has joined #openstack-infra | 22:08 | |
*** rpodolyaka has quit IRC | 22:09 | |
*** khyati has quit IRC | 22:09 | |
lifeless | clarkb: if we change the nodepool config and a script caps our ram on boot, thats going to make the change a no-op right? | 22:09 |
fungi | SpamapS: spot-checking one of those in an error state, it failed on create and the log says the exception was "OperationalError: (OperationalError) (2006, 'MySQL server has gone away') 'UPDATE node SET external_id=%s WHERE node.id = %s' ('efacd12d-95d7-4079-b4d8-3c0303e24515', 2151728L)" | 22:10 |
*** jungleboyj has quit IRC | 22:10 | |
SpamapS | fungi: ow! | 22:11 |
SpamapS | wow | 22:11 |
fungi | SpamapS: that was at 20:56:19 utc | 22:11 |
SpamapS | fungi: I think nodepool DoS'd us. ;) | 22:11 |
SpamapS | poor little cloud :p | 22:12 |
fungi | SpamapS: i think that was sqlalchemy complaining about the local mysql database on the nodepool server, so maybe it ddos'd itself | 22:12 |
*** packet has quit IRC | 22:12 | |
jeblair | it probably exceeded the mysql connection timeout | 22:13 |
fungi | got it | 22:13 |
jeblair | it == time nodepool spent waiting for something to happen | 22:13 |
fungi | and then mysqld said "you're taking too long, it's someone else's turn" | 22:13 |
SpamapS | fungi: the instances on our side have all kinds of crazy errors | 22:14 |
*** zns has quit IRC | 22:14 | |
fungi | SpamapS: sounds like a ringing endorsement of openstack | 22:14 |
SpamapS | we were seeing lots of stuff fail because neutron was throwing 500's | 22:14 |
fungi | s/openstack/neutron/ | 22:14 |
SpamapS | fungi: all software has bugs ;) | 22:14 |
*** mbacchi has quit IRC | 22:14 | |
jeblair | sometimes i wonder if there are people just as busy inside of hp and rackspace claning up errors caused by nodepool on their side as we are on ours | 22:14 |
fungi | SpamapS: you *are* aware that neutron is only designed to handle one request in a cloud at a time, right? ;) | 22:14 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Limit non tripleo nodepool nodes to 8GB of RAM https://review.openstack.org/78440 | 22:15 |
clarkb | jeblair: lifeless pleia2 ^ | 22:15 |
SpamapS | jeblair: a cloud of dissonance. | 22:15 |
* fungi is really going to cook dinner now, before his gf comes after him with knives or something | 22:15 | |
jeblair | fungi: eyes on the stove | 22:16 |
* SpamapS cooks eyes in the microwave | 22:16 | |
clarkb | SpamapS: sounds tasty | 22:17 |
pleia2 | clarkb: thanks :) | 22:17 |
*** dizquierdo has joined #openstack-infra | 22:17 | |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Make IRC bot list which failures were seen in which job. https://review.openstack.org/78790 | 22:17 |
annegentle | clarkb: any thoughts? What am I not seeing? | 22:19 |
clarkb | annegentle: sorry catching up on that now | 22:19 |
annegentle | clarkb: no worries | 22:19 |
*** bhuvan_ has joined #openstack-infra | 22:20 | |
*** bhuvan has joined #openstack-infra | 22:20 | |
*** bhuvan has quit IRC | 22:20 | |
*** bhuvan has joined #openstack-infra | 22:20 | |
*** bhuvan_ has quit IRC | 22:20 | |
*** bhuvan_ has joined #openstack-infra | 22:20 | |
clarkb | annegentle: huh | 22:20 |
annegentle | clarkb: yeah me too, puzzled | 22:21 |
*** mkoderer has quit IRC | 22:21 | |
annegentle | clarkb: guess I can try it from another server | 22:21 |
annegentle | clarkb: to make sure it's not something local | 22:21 |
clarkb | annegentle: I wonder if 0.9 is too ambiguous for some reason | 22:22 |
clarkb | like we need to do refs/tags/0.9:refs/tags/0.9 instead | 22:22 |
clarkb | jeblair: ^ | 22:22 |
*** dzimine has quit IRC | 22:25 | |
jeblair | clarkb, annegentle: context switching; hang on a sec | 22:25 |
jeblair | oh. interesting. possibly... | 22:26 |
openstackgerrit | Russell Bryant proposed a change to openstack-infra/config: Trim down gantt check/gate jobs https://review.openstack.org/71648 | 22:26 |
*** rpodolyaka has joined #openstack-infra | 22:26 | |
* jeblair pokes around a bit | 22:26 | |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Add fingerprint for swift bug 1288918 https://review.openstack.org/78788 | 22:27 |
jeblair | annegentle: what does "git show 0.9" produce? | 22:27 |
*** rcarrillocruz has joined #openstack-infra | 22:27 | |
jeblair | (i get fatal: ambiguous argument '0.9': unknown revision or path not in the working tree. ) | 22:27 |
*** dims has quit IRC | 22:28 | |
*** rcarrillocruz1 has quit IRC | 22:28 | |
*** yassine has joined #openstack-infra | 22:28 | |
*** jnoller has quit IRC | 22:29 | |
*** mfer has quit IRC | 22:30 | |
*** khyati has joined #openstack-infra | 22:32 | |
*** jhesketh__ has joined #openstack-infra | 22:34 | |
jeblair | clarkb: ping? | 22:35 |
jeblair | annegentle: ping? | 22:35 |
*** bhuvan___ has joined #openstack-infra | 22:35 | |
*** bhuvan__ has joined #openstack-infra | 22:36 | |
*** bhuvan_ has quit IRC | 22:36 | |
clarkb | jeblair: pong | 22:37 |
clarkb | jeblair: git show 0.9 does the same thing for me | 22:37 |
*** bhuvan has quit IRC | 22:37 | |
jeblair | annegentle: i'm still curious about that ^ whenever you are back | 22:38 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Move 1286963 into queries folder https://review.openstack.org/78798 | 22:39 |
*** apevec has left #openstack-infra | 22:39 | |
*** morganfainberg_Z is now known as morganfainberg | 22:41 | |
*** dims has joined #openstack-infra | 22:41 | |
*** sarob_ has quit IRC | 22:42 | |
*** vkozhukalov has joined #openstack-infra | 22:43 | |
*** smarcet has left #openstack-infra | 22:43 | |
*** alex-gone is now known as Alexandra | 22:46 | |
*** esker has quit IRC | 22:47 | |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: If not posting IRC comment for unrecognized error, log something https://review.openstack.org/78801 | 22:48 |
*** esker has joined #openstack-infra | 22:48 | |
*** rpodolyaka has quit IRC | 22:48 | |
*** rpodolyaka has joined #openstack-infra | 22:49 | |
*** rpodolyaka has quit IRC | 22:49 | |
*** rpodolyaka1 has joined #openstack-infra | 22:49 | |
SpamapS | interesting | 22:50 |
*** jungleboyj has joined #openstack-infra | 22:50 | |
SpamapS | most of the error nodes were caused by scheduling errors presumably because of the races that are inherent :p | 22:50 |
*** esker has quit IRC | 22:52 | |
jeblair | SpamapS: wow it's like the errors we see in tempest are real | 22:53 |
SpamapS | jeblair: whoa whoa whoa ... let's not jump to logic and reason | 22:54 |
morganfainberg | jeblair, hehe | 22:54 |
*** jooools has quit IRC | 22:54 | |
*** zns has joined #openstack-infra | 22:55 | |
morganfainberg | pleia2, let me know when you have some time to chat (possibly next week) so we can brainstorm some on monitoring stuffs. I have some thoughts but I want to aim for a little more real-time before we start tossing things into "bugs". Make sure I'm not off in left field. | 22:55 |
*** rcarrillocruz1 has joined #openstack-infra | 22:55 | |
*** thomasem has quit IRC | 22:55 | |
*** lcheng has quit IRC | 22:56 | |
pleia2 | morganfainberg: yeah sure, some time on monday work for you? | 22:56 |
*** rcarrillocruz has quit IRC | 22:56 | |
*** bhuvan___ has quit IRC | 22:56 | |
*** bhuvan__ has quit IRC | 22:56 | |
*** jamielennox|away is now known as jamielennox | 22:57 | |
*** rcarrillocruz has joined #openstack-infra | 22:57 | |
morganfainberg | pleia2, sounds good. uhm, I'm Pacific time, so I tend to be around a bit later than the east coast folks. other than that, i should be mostly free | 22:58 |
JayF | I'm working on trying to get a new project imported for ironic, and I can't find any documentation on what modifications are required to openstack-infra/config to execute on it. I've added it to gerritbot_channel_config.yaml, jenkins_job_builder/config/projects.yaml and review.projects.yaml -- What else do I need before pushing the merge request? | 22:58 |
*** dstanek_afk has quit IRC | 22:58 | |
*** jcooley_ has quit IRC | 22:59 | |
clarkb | JayF: http://ci.openstack.org/stackforge.html is probably be best documentation for the process | 22:59 |
clarkb | JayF: basically s/stackforge/openstack/ as you do it | 22:59 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Move 1286963 into queries folder https://review.openstack.org/78798 | 23:00 |
pleia2 | morganfainberg: yeah I'm pacific too, just give me a ping whenever, I'll be around all day :) | 23:00 |
*** dstanek_afk has joined #openstack-infra | 23:00 | |
morganfainberg | pleia2, awesome | 23:00 |
JayF | And that core group referenced in the gerritt acls is added manually? | 23:00 |
*** rcarrillocruz1 has quit IRC | 23:00 | |
*** rcarrillocruz1 has joined #openstack-infra | 23:01 | |
*** blamar has quit IRC | 23:01 | |
*** lcheng has joined #openstack-infra | 23:02 | |
*** rcarrillocruz has quit IRC | 23:02 | |
clarkb | JayF: if it is a new group the magical scriptage to add the project will add the group, but it won't have any initial members. A human needs to add the first member who is then allowed to add the remaining members | 23:02 |
JayF | gotcha. That works. I can take care of that, tyvm for pointing me at a document | 23:02 |
JayF | trying to reverse engineer it was... difficult | 23:02 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add a script to manage IRC perms https://review.openstack.org/78483 | 23:03 |
clarkb | JayF: you can leave a comment on that change or file a bug with us to let us know who the initial member should be | 23:03 |
JayF | Perfect. Thanks! | 23:04 |
sdague | SpamapS: welcome to actually gating :) | 23:04 |
*** oubiwan__ has joined #openstack-infra | 23:05 | |
jeblair | clarkb, fungi, mordred: http://paste.openstack.org/show/72830/ | 23:05 |
jeblair | clarkb, fungi, mordred: that's what the version of the script i just pushed will do; does that look sane? | 23:06 |
jeblair | actually, i'm going to change something... | 23:06 |
jeblair | (i'm adding +f to the operators acl) | 23:07 |
*** dstanek_afk is now known as dstanek | 23:08 | |
clarkb | looking | 23:09 |
*** mugsie has quit IRC | 23:09 | |
jeblair | clarkb, fungi, mordred: http://paste.openstack.org/show/72831/ | 23:09 |
jeblair | updated | 23:09 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add a script to manage IRC perms https://review.openstack.org/78483 | 23:09 |
fungi | jeblair: so that's the differences it would currently apply? | 23:10 |
fungi | lgtm | 23:10 |
jeblair | fungi: yep | 23:10 |
clarkb | I have to go read channel modes now | 23:10 |
clarkb | there are so many | 23:11 |
jeblair | clarkb: /msg chanserv help flags | 23:11 |
jeblair | is what i've been working off of | 23:11 |
jeblair | clarkb: https://review.openstack.org/#/c/78483/4/modules/openstack_project/files/accessbot/channels.yaml shows the intent | 23:12 |
clarkb | lgtm | 23:12 |
*** oubiwan__ is now known as oubiwann-ef | 23:13 | |
jeblair | okay, i'm going to like go ahead and run that and stuff then. | 23:13 |
*** banix has quit IRC | 23:13 | |
jeblair | 'openstackinfra' does not show up in that list, so i'm fairly confident it won't hose us. | 23:13 |
clarkb | jeblair: start with one channel first if you are worried? | 23:14 |
jeblair | clarkb: off it goes. :) | 23:15 |
jeblair | in slow motion; it has a 1 second sleep after each command to avoid flood protection, so it's kinda in slow motion. | 23:15 |
jeblair | i said that twice | 23:15 |
*** reaper has quit IRC | 23:15 | |
jeblair | in my defense, i'm watching the other screen mostly. | 23:15 |
openstackgerrit | A change was merged to openstack-infra/config: Install memcached on slaves https://review.openstack.org/77209 | 23:15 |
*** mfink has joined #openstack-infra | 23:18 | |
*** bhuvan has joined #openstack-infra | 23:19 | |
*** bhuvan_ has joined #openstack-infra | 23:19 | |
*** yassine has quit IRC | 23:20 | |
*** dizquierdo has quit IRC | 23:22 | |
*** thedodd has quit IRC | 23:23 | |
*** blamar has joined #openstack-infra | 23:24 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add statusbot to all known channels https://review.openstack.org/78807 | 23:24 |
*** bhuvan has quit IRC | 23:26 | |
*** bhuvan_ has quit IRC | 23:26 | |
*** e0ne_ has joined #openstack-infra | 23:26 | |
*** e0ne has quit IRC | 23:28 | |
sdague | so what's the RPC issue - https://jenkins01.openstack.org/job/gate-python-swiftclient-docs/390/console ? | 23:30 |
sdague | https://jenkins01.openstack.org/job/gate-python-swiftclient-docs/390/console | 23:30 |
sdague | grr, one sec | 23:30 |
sdague | 2014-03-06 22:42:55.806 | + git fetch http://zm01.openstack.org/p/openstack/python-swiftclient refs/zuul/master/Z01d5da78d42e4addbab287038e671186 | 23:30 |
sdague | 2014-03-06 22:45:14.060 | error: RPC failed; result=7, HTTP code = 0 | 23:30 |
jeblair | irc changes are all done, and the script doesn't want to make any more changes on the next pass | 23:30 |
jeblair | sdague: that machine is very lightly loaded: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=117 | 23:31 |
sdague | ok | 23:31 |
sdague | well, it just failed | 23:31 |
sdague | causing a gate reset on a pep8 job | 23:32 |
jeblair | sdague: i understand your question. | 23:32 |
*** dkranz has quit IRC | 23:32 | |
sdague | ok | 23:32 |
*** mriedem has quit IRC | 23:32 | |
*** rpodolyaka1 has quit IRC | 23:34 | |
jeblair | so this is tricky... | 23:35 |
jeblair | nodepool logs the v4 address of machines it creates, but not the v6 | 23:35 |
jeblair | and that machine was talking to zm01 over v6 | 23:35 |
zaro | clarkb: please check PS3 cover comment https://review.openstack.org/#/c/60893 | 23:35 |
jeblair | we should probably have all the jobs output their ip addresses | 23:35 |
jeblair | i can guess one from the logs though | 23:35 |
clarkb | jeblair: ++ to dumping ip addresses | 23:36 |
clarkb | zaro: I was hoping that someone had checked it won't break review.o.o too :) | 23:36 |
*** e0ne_ has quit IRC | 23:37 | |
*** andreaf2 has joined #openstack-infra | 23:37 | |
zaro | clarkb: sorry out of my league, but i think you can do. | 23:38 |
openstackgerrit | A change was merged to openstack-infra/config: add tests for gerrit builds https://review.openstack.org/77715 | 23:38 |
clarkb | jeblair: for jheskeths add footer change to zuul. Would you prefer I allow you to review that before approving? | 23:39 |
jeblair | clarkb: please | 23:39 |
clarkb | ok | 23:39 |
*** rpodolyaka has joined #openstack-infra | 23:39 | |
*** andreaf has quit IRC | 23:39 | |
jeblair | sdague: i can't track down a log entry for that. the lack of ip address means i may just be missing it. but it's also possible that the worker just never successfully connected to zm01. | 23:40 |
sdague | ok | 23:41 |
*** bhuvan has joined #openstack-infra | 23:41 | |
*** bhuvan_ has joined #openstack-infra | 23:41 | |
*** mrodden has quit IRC | 23:43 | |
lifeless | clarkb: hey, jog says I shold talk to you about getting tripleo seed/undercloud/overcloud logs into the e-r log pipeline | 23:43 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/jeepyb: Welcome message hook query result is an int https://review.openstack.org/78810 | 23:44 |
clarkb | if the end up at logs/screen-servicename.txt on the log archive it will be automagic | 23:44 |
clarkb | jenkins console log is automagic for all jobs | 23:44 |
lifeless | clarkb: these are synced from different nodes | 23:45 |
lifeless | clarkb: and we have N logs per job - e.g. 2 nova-compute logs | 23:45 |
clarkb | oh that makes it trickier | 23:45 |
clarkb | what we really need to do is have the http getter walk the tree over http | 23:45 |
clarkb | but I haven't had time to do that. Then pattern match filenames instead of full paths | 23:46 |
*** rcleere has quit IRC | 23:46 | |
openstackgerrit | A change was merged to openstack-infra/config: Doc: oslo.sphinx -> oslosphinx https://review.openstack.org/77731 | 23:46 |
*** fbo is now known as fbo_away | 23:47 | |
*** bhuvan_ has quit IRC | 23:48 | |
*** bhuvan has quit IRC | 23:48 | |
openstackgerrit | Brant Knudson proposed a change to openstack/requirements: Uncap sphinx https://review.openstack.org/78812 | 23:49 |
openstackgerrit | Joshua Hesketh proposed a change to openstack-infra/zuul: Add debugging metrics to RPC https://review.openstack.org/78813 | 23:53 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/statusbot: Don't crash on invalid UTF8 https://review.openstack.org/78814 | 23:53 |
jeblair | fungi, clarkb: ^ wow. | 23:54 |
*** alexpilotti has quit IRC | 23:54 | |
clarkb | jeblair: wow | 23:56 |
*** jp_at_hp has joined #openstack-infra | 23:56 | |
*** changbl has quit IRC | 23:56 | |
jeblair | irc.client.dont_crash=True | 23:57 |
*** jcoufal has quit IRC | 23:57 | |
*** Ryan_Lane1 has joined #openstack-infra | 23:57 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!