Friday, 2016-03-04

*** ayoung has quit IRC00:07
openstackgerritBen Nemec proposed openstack/tripleo-common: Add capabilities filter for Nova  https://review.openstack.org/28808700:11
openstackgerritBen Nemec proposed openstack/instack-undercloud: Configure nova to use custom scheduler filter  https://review.openstack.org/28818800:14
openstackgerritBen Nemec proposed openstack/instack-undercloud: Configure nova to use custom scheduler filter  https://review.openstack.org/28818800:20
*** lblanchard has joined #tripleo00:21
*** saneax is now known as saneax_AFK00:50
*** ayoung has joined #tripleo01:05
*** mbound has quit IRC01:16
*** mbound has joined #tripleo01:17
*** dprince has joined #tripleo01:18
*** shivrao has quit IRC01:19
*** yamahata has quit IRC01:37
*** shivrao has joined #tripleo01:46
*** cwolferh has quit IRC01:46
*** dmsimard has quit IRC01:50
*** dmsimard has joined #tripleo01:51
*** yuanying has quit IRC01:58
*** yuanying has joined #tripleo01:59
*** trozet has joined #tripleo02:08
*** mbound has quit IRC02:12
*** cwolferh has joined #tripleo02:13
openstackgerritDan Prince proposed openstack-infra/tripleo-ci: WIP: IPv4 network isolation testing  https://review.openstack.org/28816302:15
*** tiswanso has quit IRC02:19
*** tiswanso has joined #tripleo02:20
*** shivrao has quit IRC02:33
*** dprince has quit IRC02:40
*** mburned is now known as mburned_out02:42
*** links has joined #tripleo03:01
*** lblanchard has quit IRC03:07
*** trozet has quit IRC03:08
*** yuanying has quit IRC03:16
openstackgerritRichard Su proposed openstack/tripleo-heat-templates: Store events in Ceilometer  https://review.openstack.org/28756103:25
*** rhallisey has quit IRC03:27
openstackgerritRichard Su proposed openstack/instack-undercloud: Store events in Undercloud Ceilometer  https://review.openstack.org/28673403:29
EmilienMcan I have a review on https://review.openstack.org/#/c/286584/ please ?03:29
*** lblanchard has joined #tripleo03:33
*** panda has quit IRC03:38
*** panda has joined #tripleo03:38
*** yamahata has joined #tripleo03:41
*** links has quit IRC03:44
*** cwolferh has quit IRC04:06
*** yuanying has joined #tripleo04:08
*** yuanying has quit IRC04:12
openstackgerritMichael Chapman proposed openstack/tripleo-heat-templates: Adds OpenDaylight support  https://review.openstack.org/20025304:15
*** lblanchard has quit IRC04:19
*** links has joined #tripleo04:27
*** openstack has joined #tripleo04:29
*** lynxman has quit IRC04:29
*** yuanying has joined #tripleo04:30
*** dtantsur|afk has quit IRC04:30
*** xinwu has quit IRC04:30
*** openstack has quit IRC04:30
*** openstack has joined #tripleo04:32
*** dtantsur has joined #tripleo04:32
*** openstack has quit IRC04:34
*** openstack has joined #tripleo04:35
*** lynxman has quit IRC04:36
*** cwolferh has joined #tripleo04:36
*** openstack has quit IRC04:36
*** openstack has joined #tripleo04:37
*** tiswanso has quit IRC04:38
*** openstack has quit IRC04:39
*** openstack has joined #tripleo14:06
*** tiswanso has joined #tripleo14:07
trownjistr: thanks a ton for all your help14:08
*** mandre has joined #tripleo14:10
openstackgerritMerged openstack/tripleo-heat-templates: Set notification driver for nova to send  https://review.openstack.org/28368614:13
dprincegfidente: can you join here https://redhat.bluejeans.com/u/dprince/14:14
*** Goneri has joined #tripleo14:14
*** tzumainn has joined #tripleo14:16
openstackgerritMerged openstack/tripleo-common: Override OS::Nova::Server for user_data updates  https://review.openstack.org/28476914:18
*** jprovazn has joined #tripleo14:18
gfidentehttps://github.com/openstack/os-cloud-config/blob/master/os_cloud_config/keystone.py#L54614:20
openstackgerritOpenStack Proposal Bot proposed openstack/os-collect-config: Updated from global requirements  https://review.openstack.org/26444614:21
openstackgerritOpenStack Proposal Bot proposed openstack/python-tripleoclient: Updated from global requirements  https://review.openstack.org/26852814:22
slaglejistr: trown : i think you might need to base https://review.openstack.org/#/c/288460 on https://review.openstack.org/#/c/24416214:23
jistrtrown: hey, i'm back from the call, saw your comment on the patch14:23
trownslagle: hmm... but why would nonha pass without that?14:24
jistrslagle: i think we're trying to just get the Default keystone domain back, getting the full keystone init via puppet in is orthogonal i think14:24
*** mandre has quit IRC14:24
trownon a positive note looks like the heat stack-delete issue is resolved14:24
jistrslagle: it was previously created by db sync, now we need to run the bootstrap14:24
jistrslagle: https://github.com/openstack/keystone/blob/f699ca93fc6f2485ec8e76e907572a2f838cd3cb/releasenotes/notes/no-default-domain-2161ada44bf7a3f7.yaml14:25
slagleyea but the ha job was failing while running the bootstrap, https://review.openstack.org/#/c/286352/14:25
slaglethat's why it was disabled14:25
*** dustins has joined #tripleo14:25
slagleonly for ha, nonha passed fine14:26
jistrslagle: yeah because it tried to run too early14:26
jistr(i think :) )14:26
trownalthough I think the new heatclient feature to prevent me from accidentally deleting my stack is still causing the pingtest cleanup to hang... really confused how the tripleoci periodic job could get past that14:26
slaglejistr: ok14:26
openstackgerritJiri Stransky proposed openstack/tripleo-heat-templates: Run keystone-manage bootstrap for HA deployment too  https://review.openstack.org/28846014:30
jistrtrown: it *might* have been a race condition indeed. Could you please try again with ^^ ?14:31
jistradded explicit ordering14:31
trownjistr: sure thing... now that I can cleanly delete stacks retrying is so much nicer14:31
*** leanderthal is now known as leanderthal|mtg14:31
ayoungSo...undercloud neutron uses 192.0?14:33
ayoungfor the control plane.14:33
ayoungAnd, I know, I know...there are no ranges we can count on not being used14:33
openstackgerritDmitry Tantsur proposed openstack/python-tripleoclient: [WIP] Allow 'openstack baremetal configure boot' to guess the root device  https://review.openstack.org/28841714:33
ayoungbut that one...its even more reserved than the rest14:33
jistrslagle: or actually it might have failed previously just because it tried to run on all nodes instead of on $pacemaker_master... going to look if the bootstrap uses API calls or directly goes to DB14:35
*** openstackgerrit_ has quit IRC14:36
trownjistr: I had problems earlier with the boostrap running on all nodes14:37
shardyayoung: yes, see undercloud.conf, the provisioning network defaults to 192.0.2.0/2414:37
*** openstackgerrit_ has joined #tripleo14:37
trownI think it goes directly to DB14:37
ayoungshardy, so, this is kindof a violation of the RFC:14:37
ayoungshardy, that is not a usable local, non routable.14:38
ayoungshardy, and, of course, IPA enforces that. Which puts me in a tricky position trying to integrate14:38
shardyayoung: perhaps we should default to a different subnet, but you can control it via undercloud.conf to be whatever you want14:39
shardyit does appear to conflict with rfc5737, you're right14:40
ayoungshardy, there are no good answers with IPv4.  We need IPv6, which should be usable by now.14:40
*** mkovacik has quit IRC14:40
ayoungshardy, yeah, found that a midnight last night14:40
ayoungshardy, I was not happy14:40
*** mandre has joined #tripleo14:40
ayoungediting the python of an installed RPM....14:40
shardyayoung: why editing python?14:41
ayoungshardy, heh14:41
ayoungshardy, to remove the check in IPA of course14:41
trownah ha, the periodic job is hitting the new heatclient "feature": 2016-03-04 08:27:00.894 | tripleo.sh -- Overcloud pingtest - time out waiting to delete tenant heat stack, please check manually14:41
shardylike I said, you should be able to configure it via undercloud.conf14:41
shardyotherwise we have two bugs ;)14:41
*** links has quit IRC14:41
ayoungshardy, yeah...I'm just learning this  flow.  Let me see...14:42
shardyayoung: can you raise a launchpad bug with your findings please, then we can discuss there and figure out the best plan?14:42
ayoungshardy, will do14:42
jaosoriorayoung: lol, done that14:42
jaosorior(editing the code from an installed rpm, it's nasty... but gotta do what you gotta do)14:42
jistrtrown: keystone bootstrap doesn't do API calls, and it should probably run before keystone is started. So what i have in the patch is probably wrong, but still there's a possibility that it could succeed, so i wouldn't cancel the test just yet. It probably belongs to step 3 indeed, just on $pacemaker_master i guess though. /cc slagle14:42
shardyayoung: if you're using tripleo.sh then it's probably using the defaults for everything14:43
shardycp /usr/share/instack-undercloud/undercloud.conf.sample ~/undercloud.conf then hack away, and you should be able to refine the addresses to match what you need until we fix the defaults14:43
trownjistr: ok if this fails I will try on step 314:44
openstackgerritPradeep Kilambi proposed openstack/tripleo-heat-templates: Set notification driver for nova to send  https://review.openstack.org/28849714:45
ayoungshardy, https://bugs.launchpad.net/tripleo/+bug/155322214:45
openstackLaunchpad bug 1553222 in tripleo "Default undercloud control plan network violates rfc5737" [Undecided,New]14:45
shardyayoung: thanks!14:45
ayoungah...looovely typp14:45
openstackgerritJiri Stransky proposed openstack/tripleo-heat-templates: Run keystone-manage bootstrap for HA deployment too  https://review.openstack.org/28846014:46
* ayoung needs to figure out what range to use14:46
jistrtrown: here's code for the next attempt ^^14:46
trownjistr: awesome current deploy is in step 5, so should fail or not shortly14:48
*** leanderthal|mtg is now known as leanderthal14:56
bnemecgfidente: derekh: Noticed your earlier discussion about having issues with the postconfig.  I've actually run into the same sort of thing locally when I use net-iso.14:58
bnemecI think we don't noproxy either the admin or public address, so if you have http_proxy set (like we do in CI), then it tries to go through the proxy and hangs because the proxy doesn't know about the address.14:58
derekh^ dprince that could be it14:59
*** mandre has quit IRC14:59
*** jcoufal_ has quit IRC14:59
dprincederekh, bnemec I don't use no_proxy in my environment either14:59
derekhbnemec: sounds fairly plausable, dprince wanna try that in your recheck14:59
dprincederekh: yep, lets do it15:00
*** jrist has joined #tripleo15:02
openstackgerritJohn Trowbridge proposed openstack/tripleo-common: Use 'yes' hack for ping test stack delete  https://review.openstack.org/28851115:02
openstackgerritDan Prince proposed openstack-infra/tripleo-ci: WIP: IPv4 network isolation testing  https://review.openstack.org/28816315:05
*** mbound has quit IRC15:05
dprincebnemec: ^^ unset_http there15:08
dprincebnemec: https://review.openstack.org/#/c/288163/3/scripts/deploy.sh15:08
bnemecdprince: Looks good.15:09
bnemecFWIW, the reason I haven't looked into fixing this in the client is that this is all going away once keystone init is done by puppet.15:10
*** mandre has joined #tripleo15:10
dprincebnemec: I know! I was seriously considering arguing we land that patch to get this working15:10
bnemecBut given how long that has taken maybe it's worth fixing anyway.15:10
dprincebnemec: but there is still and HA issue w/ the keystone puppet stuff so step at a time15:10
bnemecYeah15:10
trownWOOOOOOOOT!!! Trunk HA overcloud CREATE_COMPLETE!! jistr you rock!!15:14
jistrahahaha awesome :))15:14
trownrunning ping test to be sure, but I think we may have a winner15:14
jistrtrown: can you please try with step 3 (the latest version of the patch) too? i think if step 3 passes too, we should use that i think15:14
trownjistr: only latest version of the patch works15:15
trownjistr: PS2 did not work15:15
jistrtrown: ok cool :)15:15
trownjistr: interestingly, what is in PS3 is in the keystone init patch, but as suspected there is other stuff there that is not working15:16
jistrtrown, slagle: i'm still not sure if running keystone bootstrap breaks upgrades, maybe it could, but i think at this point we have no choice, we gotta get the CI green15:16
slaglejistr: i didnt think this was breaking CI?15:17
slaglethat's why the bootstrap running got disabled15:17
trownjistr: I confirmed that the keystone-manage bootstrap command can be re-run15:17
slagleor you're saying we need it15:17
*** mbound has joined #tripleo15:17
trownslagle: semantics on broken CI15:18
trownslagle: CI is broken if we can not deploy trunk15:18
trownit is just a hidden breakage so it does not block everyone from working15:18
trownbut it is still broken15:18
derekhtrown: more semantics on broken CI15:18
derekhCI is not broken, trunk is15:18
slagleoh ok, rgr that15:18
*** trozet has joined #tripleo15:19
trownderekh: ya, but tripleo trunk... not other project trunk15:19
*** mandre has quit IRC15:19
slaglewell, i think that's what derekh is saying15:19
slagleor isn't saying15:19
slaglei dunno :)15:20
trownya, there are different ways to say it.... but if we have not promoted anything in over a week, tripleo is in crisis15:20
derekhopenstack trunk is broken and can't be deployed with trunk,15:20
trowneven though everyone can continue to merge stuff that may very well make the situation worse15:20
slaglei mean if kystone and puppet-keystone pushes a change that breaks the way existing HA deployments is that really "tripleo is broken"?15:20
trownwe dont know because we are testing against the past15:20
*** jcoufal has joined #tripleo15:21
slagleit is in the sense, that trunk changed in a backwards incompatible way15:21
slaglei guess that's acceptable for openstack15:21
slaglesince it happens...all the time15:21
derekhI'n general we can just say "deploying trunk is broken", doesn't matter what project is the problem15:21
trownright... and I think we want to be able to deploy trunk...15:22
*** mandre has joined #tripleo15:22
trownif not, that is a bigger problem15:22
derekhmy point here is that "ci is broken" is the term I want people to stop using,15:22
derekhunless it actually is broken15:22
slagleci is working at catching failues :)15:23
trownderekh: ok, so when dtantsur or pradk or any of the people who are trying to integrate with tripleo ask me why there patch cant pass because we are testing against 8-30 days in the past, I guess that is not CI being broken?15:23
derekhtrown: we'll its not broken, they would have the exact same problem when trying to test it locally with tripleo.sh15:24
derekhtrown: its not a problem specific to ci15:24
openstackgerritGiulio Fidente proposed openstack/tripleo-heat-templates: DO NOT MERGE: test CI netiso on liberty branch  https://review.openstack.org/28852615:24
gfidentederekh, ^^ makes sense?15:24
dtantsurso I guess we can call it "tripleo is broken" :D15:24
* dtantsur ducks15:24
* bnemec steps away from the semantic minefield :-)15:25
trownthis just feels like a step in the wrong direction semantically... but I agree it is a minefield15:25
derekhour dev process it broken15:25
slaglemy spirit is broken15:26
dtantsurEVERYTHING IS BROKEN15:26
shardy'cmon guys, it's FRIDAY, shall we lighten up? :)15:26
dtantsurquickly, post owls15:26
trownhttps://www.youtube.com/watch?v=fKd-07E6ses15:26
shardyowls, yes there's a good idea :)15:27
derekhHaving said all that, ci was broken for a few hours this morning when all the testenvs ran out of RAM, so ya....that happens too15:27
openstackgerritKarim Boumedhel proposed openstack/puppet-pacemaker: When using Rhevm stonith mechanism, fence packages could allready be referenced somewhere else in the code so such a requirement shouldnt sit in the defined type. Constraints on the stonith resources are also unecessary  https://review.openstack.org/28852715:27
dtantsurquite a long summary, isn't it ^^?15:27
*** yamahata has joined #tripleo15:28
derekhgfidente: never tied anything like that but looks sane15:28
trownjistr: no dice on the ping test... though I am betting this is some other issue that was just hidden by not being able to get to deploy15:31
trownjistr: I guess we will see if tripleoci passes15:31
*** mburned is now known as mburned_out15:33
*** mburned_out is now known as mburned15:33
jistrtrown, slagle: ok i *thought* that keystone issue was the reason CI was red, apparently i'm really confused today :D15:33
trownjistr: it is one of the reasons the periodic CI is red15:34
*** david_lyle__ has joined #tripleo15:34
trownderekh: if we were deploying trunk of everything (like all other project CI), CI would be broken if we cant deploy trunk15:35
bnemecAll of the other projects are co-gating too.15:35
trownderekh: I totally understand why that is a bad thing, and would block people getting work done, but it still feels like CI is broken if we cant deploy trunk15:36
dprincetrown: I always use trunk in my local dev environment15:36
trownobviously I am biased given that in order for tripleo to participate in RDO we need tripleo to work with trunk15:36
dprincetrown: I think derekh usually does too15:37
*** david-lyle has quit IRC15:37
dprincetrown: and now we have periodic jobs on trunk nightly as well15:37
openstackgerritDmitry Tantsur proposed openstack/python-tripleoclient: Allow 'openstack baremetal configure boot' to guess the root device  https://review.openstack.org/28841715:37
trowndprince: which have never passed15:37
dprincetrown: I would rather have a subset of the team focussed on keeping trunk working15:37
dprincetrown: they have passed once or twice15:37
*** panda has quit IRC15:37
dprincetrown: http://tripleo.org/cistatus-periodic.html15:37
dprincetrown: nonha and ceph actually passed last night15:38
*** panda has joined #tripleo15:38
trowndprince: that is not my argument... I agree on that part of the tripleoci strategy, it is just the semantic game of "tripleoci is not broken even though it cant deploy trunk" I think says to me it is not a priority of tripleo project to work with trunk15:38
slagleshardy: hey, in overcloud.yaml, in AllNodesValidationConfig, does the fact that Controller is used there in properties means there is an implicit depends_on?15:38
*** mbound has quit IRC15:38
slagleshardy: we are seeing ComputeAllNodesValidationDeployment fail a lot in CI, failing to ping the controller ip's15:39
slagleshardy: i'm just wondering if maybe the controller isn't fully done with the NetworkDeployment15:39
trowndprince: we cant promote on only some jobs passing...15:39
*** yamahata has quit IRC15:40
dprincetrown: agree in general. but do note that the HA is generally unstable anyways, even with pinned CI repository15:40
dprincetrown: so we might occasionally make exceptions to this15:40
shardyslagle: Yep, the get_attr is an implicit depends_on15:40
shardyso the Controller ResourceGroup must be CREATE_COMPLETE before that runs15:41
slagleok, dang, i guess :) that was my only theory15:41
trowndprince: https://review.openstack.org/288460 gets me to CREATE_COMPLETE with HA on trunk15:41
trowndprince: though the ping test is failing for me now15:41
openstackgerritDmitry Tantsur proposed openstack/python-tripleoclient: Allow 'openstack baremetal configure boot' to guess the root device  https://review.openstack.org/28841715:45
dtantsurlucasagomes, ^^15:45
dtantsurfinished with this one, I guess15:45
EmilienMFYI puppet-ceilometer is about to drop alarming code, since it's now in Aodh. We should really consider using Aodh in TripleO from now15:47
dprinceEmilienM: I noticed aodh-evaluator is raising connection errors with network isolation enable15:49
EmilienMdprince: interesting, do you have a trace I can look?15:50
EmilienMpradk: ^15:50
dprinceEmilienM: not ATM. I will get it. Just noticed it as something new...15:50
dprincehttps://etherpad.openstack.org/p/tripleo-mitaka-rc-blockers15:50
dprincejistr, marios, shardy, gfidente, jprovazn, slagle, bnemec, derekh15:51
dprinceregarding the etherpad above, did we reach agreement about the requirements for landing those code changes?15:52
dprinceThis is to continue our tripleO IRC discussion from http://eavesdrop.openstack.org/meetings/tripleo/2016/tripleo.2016-03-01-14.04.log.html15:52
openstackgerritBen Nemec proposed openstack/tripleo-common: Add capabilities filter for Nova  https://review.openstack.org/28808715:53
dprinceshardy: We decided to revisit this next week I think but I'm getting pings about making this decision today so I wanted to try and catch people before the weekend if possible15:54
openstackgerritBen Nemec proposed openstack/instack-undercloud: Configure nova to use custom scheduler filter  https://review.openstack.org/28818815:56
openstackgerritDmitry Tantsur proposed openstack/python-tripleoclient: Completely removed the old discovery image support  https://review.openstack.org/28854615:56
jaosoriorCan someone review this cr https://review.openstack.org/#/c/287199/ ? it's needed to solve this bz https://bugzilla.redhat.com/show_bug.cgi?id=131385515:56
openstackjaosorior: Error: Error getting bugzilla.redhat.com bug #1313855: NotPermitted15:56
shardydprince: I thought we agreed manual testing would be OK, but we didn't reach consensus about whether we can push for cutting an RC at the same time as other projects15:56
shardyhttps://review.openstack.org/#/c/278979/ is now ready for a second reviewer if that helps get one off the list ;)15:57
trownI would add not being able to deploy the other RCs would be a blocker too15:58
dprinceshardy: manual testing with what though? I hear people saying they hven't used upstream TripleO for weeks....15:58
bnemeco.O15:58
dprinceshardy: manual testing is fine. But is also risky if we aren't all doing the same things15:58
dprinceshardy: like very few people use network iso I think15:59
bnemecDamn, the list of must-haves has gotten quite long. :-(15:59
dprinceEmilienM: https://bugs.launchpad.net/tripleo/+bug/155325015:59
openstackLaunchpad bug 1553250 in tripleo "aodh-evaluator: ToozConnectionError: Error 113 connecting to 172.16.2.5:6379. No route to host." [Medium,Triaged]15:59
shardydprince: My impression was that we weren't going to block ipv6 and SSL on CI, but they would both get significant manual testing (with upstream code) and we'd push hard on getting the CI in place after the RC15:59
*** yamahata has joined #tripleo15:59
bnemecI seriously thought it was going to be IPv6 and a few other things that were mostly finished.15:59
shardythe alternative is to block that stuff, but be forced to backport it after we branch16:00
dprincebnemec: yeah that list grew a bit didn't it :)16:00
shardywhich will then block all the architectural rework as if we hadn't branched16:00
*** leanderthal is now known as leanderthal|afk16:00
dprinceshardy: without CI I don't think we do architectural rework16:00
dprinceshardy: CI blocks that too16:01
shardywell, we have CI, just not of those new features16:01
shardy(and a bunch of other stuff, I know)16:01
dprinceshardy: right, but we still don't have CI on features from the last release16:01
shardydprince: Ok, what are you proposing?16:01
openstackgerritJohn Trowbridge proposed openstack/tripleo-common: Use 'yes' hack for ping test stack delete  https://review.openstack.org/28851116:02
shardyblock those features on CI, cut the RC anyway, then have another cycle where feature backports are permitted?16:02
dprinceshardy: well, I think I've just been complaining so far :)16:02
*** Goneri has quit IRC16:02
pradkdprince, EmilienM, i dont think that is specific to aodh-evaluator.. i noticed that in with ceilometer as well before aodh patch merged16:02
pradkdprince, EmilienM, see ceilometer/central.log16:02
shardydprince: Whichever way we go, it's not great, but to me getting a clean-slate to work from for Newton is really important16:02
bnemecdprince: Our scrum call is starting, so some of us are going to be semi-away for a few minutes.16:03
pradkEmilienM, dprince i think its the tooz setup thats http://paste.openstack.org/show/489348/16:03
dtantsurrelated question: when is feature freeze for tripleo?16:03
shardydprince: perhaps we do need to clearly communicate the need for CI coverage of all new features from Newton onwards tho?16:04
jaosoriorEmilienM Thanks for the review16:04
shardy(assuming we actually get CI capacity to handle that)16:04
EmilienMjaosorior: you don't have to thank me to review code16:04
dprinceshardy: Yeah, I just hate to see us trash Mitaka in the meantime16:04
trowndtantsur: never?16:04
jaosoriorfair enough16:04
dtantsurlol16:04
dprinceslamming feature in quick without CI is a sure way to break something16:04
shardydprince: which are you most worried about, ipv6?16:05
lucasagomesdtantsur, commented inline with a suggestion16:05
lucasagomesdtantsur, lemme know what you think16:05
shardydprince: well, it's a sure way to land something which doesn't work or quickly breaks16:05
dprinceshardy: I'd really like the IPv4 CI job in first16:05
shardybut I guess the question is, do we have enough coverage for confidence re regressions16:05
shardysounds like the answer is no16:05
shardydprince: Ok, what's the eta on that?16:06
dprinceshardy: we don't, IPv4 was broken for weeks. Nobody noticed16:06
dtantsurlucasagomes, well, I don't like both what I have and what you suggest :)16:06
dprinceshardy: maybe today, maybe next week. We are closing in on it16:06
lucasagomesdtantsur, hah16:06
lucasagomesdtantsur, right... yeah it's tricky16:06
shardydprince: Ok, maybe we block on that then16:06
dtantsurlucasagomes, I don't want people to really make lists like sda,vda,... not sure how to detect "first" better though16:07
lucasagomesdtantsur, yeah... I think that's because the definition of "first" may be a bit weak there... We can have devices connected to different disk controllers16:07
shardydtantsur: we haven't had a formal feature freeze which is why we're in this situation16:07
lucasagomesso which one are the "first" ?16:07
lucasagomesit's sounds a bit bogus16:07
shardydtantsur: I hope we can move to a much stricter model (like other projects) from Newton16:08
dtantsurlucasagomes, well, for people I was talking to, first is /dev/sda... (or rather the same behaviour that we had in Kilo, i.e. sda/vda/hda)16:08
dtantsurshardy++16:08
dtantsurlucasagomes, so do you think we should allow --detect-root-device=largest, --detect-root-device=smallest, --detect-root-device=sda,vda,hda?16:09
lucasagomesdtantsur, right, but that assumption is wrong IMHO... that's why the list of device order is slightly better, because each operator can input whatever they think the order from first or last are16:09
lucasagomesthey can define it16:09
dtantsurlucasagomes, i.e. allow a couple of strategies + allow to provide a list16:09
*** mikelk has quit IRC16:09
lucasagomesdtantsur, yeah the smallest and larger makes total sense16:09
lucasagomesdtantsur, I think that way is better, cause since we can't logically define what is first what is last (unless all disks are connected to the same disk controller so we can actually look at the physical address)16:10
dtantsurlucasagomes, last question: do we need to support full paths, like --detect-root-device=/dev/sda,/dev/hda?16:10
dtantsurlucasagomes, I'm afraid of people having some md-based magic with devices not in /dev16:11
*** mgould has quit IRC16:11
lucasagomesdtantsur, could be yeah... I don't know how the "name" is set in the disks list there16:12
lucasagomesbut if that's the full device path yeah we can do full path16:12
dtantsurlucasagomes, IPA returns /dev/sda etc16:12
lucasagomessounds good then16:12
dprinceshardy: I've added 2 more items to the list. Which may not be "blockers" but shouldn't get left out16:12
dtantsurlucasagomes, hmm, but it does it this way: https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L109 :)16:12
dprinceshardy: The upgrades CI job, and the IPv4 network iso testing (from last release)16:13
lucasagomesdtantsur, hah yeah... well all devices will be under /dev anyway16:13
lucasagomeseven the ones you look /sys/blocks/by-label etc... are in /dev/ too16:13
dtantsurlucasagomes, ok, so no full paths for now then.. we can always enable them later16:13
lucasagomesdtantsur, cool16:13
dprinceshardy: I think we already agreed we'd have the upgrades CI job in place before we did architectural changes anyways. So that blocks newton regardless IMO16:13
dtantsurlucasagomes, thanks for review, will post an update soon16:14
lucasagomesdtantsur, cool no problem16:14
*** rdopiera has quit IRC16:14
dprinceshardy: if we had CI on those two things I'd be happy to merge the rest of it16:14
*** Marga_ has quit IRC16:17
*** david_lyle__ is now known as david_lyle16:17
dprinceshardy: I'd like to take a step back and ask why we are branching again16:17
dprinceshardy: you want to branch so that it opens up N for architectural work16:18
dprinceshardy: I'm keen on this too16:18
shardydprince: we are also branching so downstreams (such as RDO) have a release to consume16:19
slagledprince: hey, was in a mtng, let me catch up on the conversation16:19
shardywe need to reach the point where we can say "this is mitaka TripleO"16:19
shardyjust like all other projects16:19
dprincebut, we've also agree that we won't make architectural changes without working CI on these things. Upgrades/Network ISO16:19
bnemecdprince: So regarding CI on upgrades, I think there's two different issues in play.16:19
bnemecOne is that trown wants to branch at the same time as everyone else so we have a thing to deploy RDO Mitaka right away.16:19
shardydprince: we can agree trunk is still restricted after we branch pending CI changes, sure16:19
dprinceshardy: so whether we branch or not it really is a matter of do we want to merge code that works16:19
shardybut that doesn't mean we don't have to release mitaka IMO16:20
bnemecThe other is that you want to start working on major architectural changes that will cause backport headaches.16:20
bnemecI've been treating the etherpad as addressing the former.16:20
dprinceshardy: releasing Mitaka without CI on these features (or features from last release) would likely mean releasing a broken Mitaka16:20
dprincebnemec: I don't want the backport headache's16:21
shardydprince: we have to release *something*, even if we don't land all those features16:21
shardywe can't just say, let's not bother branching16:21
dprinceshardy: I'm actually fine with that16:21
shardythat's not the way the OpenStack release model works16:21
shardy(not that TripleO has ever properly respected it)16:21
trownlol16:21
dprinceEither we choose a hard line date, or we choose the set of features we want. Not both16:22
dprinceThis etherpad represents the worst possible compromise in that we are trying to choose both16:22
trownI agree to that bit16:22
shardydprince: Yeah, really we have to do better next cycle, and declare a feature freeze, like other projects16:22
shardythen we have a better window for stablizing things and ensuring we're happy before releasing16:22
shardydprince: I agree there's too much in the etherpad16:23
shardyI was hoping we'd have a small list of stuff, so we could focus review attention16:23
shardyinstead, it's turned into a feature crunch :(16:23
dprinceshardy: Yeah. This etherpad is what I've been looking at https://etherpad.openstack.org/p/tripleo-ipv6-support16:23
*** jaosorior has quit IRC16:24
slagletrown: when are the rdo mitaka repos getting setup?16:24
slaglei'd think we would need to branch by then16:25
slaglebut if the needed features aren't landed, what are we saying? that we allow some feature backport exceptions for mitaka?16:25
openstackgerritBrad P. Crochet proposed openstack/tripleo-common: Upgrades: Add StackUpgradeManager  https://review.openstack.org/28856816:26
shardysounds like we don't have any other option16:26
slagleshardy: yea :(16:27
shardythen we'll advertise an actual FeatureFreeze for Newton so this doesn't happen again16:27
trownslagle: asked apevec in #rdo16:27
slaglewhich means double the backport work, etc16:27
slaglebut i don't see another way16:27
slaglewell...16:27
shardyslagle: that's still a better outcome than having features backported througout the entire forthcoming cycle16:27
slaglethe other way would be to trim the scope of what we need to backport16:27
dprinceif there is backport work from Newton to Mitake I don't want to see us hold up new features to make it easier16:28
slaglewell, i think it would be just what is not landed from this etherpad in master, when we branch mitaka16:29
*** xinwu has joined #tripleo16:29
bnemecWe may have to have two dates - a branch date, and a "no more backports" date.16:29
trownthat etherpad is huge though16:29
bnemecIPv6 and upgrades are the two really scary ones though.16:29
ayoungA ggogle search for ruby puppet rabbit is not too informative except if you want a hand puppet of a rabbit16:30
*** penick has joined #tripleo16:30
*** pblaho has quit IRC16:30
openstackgerritDmitry Tantsur proposed openstack/python-tripleoclient: Allow 'openstack baremetal configure boot' to guess the root device  https://review.openstack.org/28841716:31
dtantsurlucasagomes, ^^16:31
dprincederekh: should I have rebased my patch to pick up your latest tripleo-ci fix to cleanup the VMs?16:32
dprincederekh: I noticed the ceph job also failed due to an Ironic deployment failure (the Heat server resource failed to deploy)16:33
derekhdprince: its merged so your patch would be rebased by zuul wouldn't it? checking16:33
dprincederekh: perhaps, but that perhaps means the cleanup isn't working yet.16:34
*** aufi has quit IRC16:34
*** mgould has joined #tripleo16:34
*** ifarkas has quit IRC16:37
derekhdprince: your ceph ci run had gotten the patch and it doesn't look like an ironic failure to me16:38
derekh2016-03-04 16:27:59.448 | | CephStorageAllNodesValidationDeployment   | e98082e6-3453-48c5-a229-af5af3ea6a0b          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2016-03-04T16:05:34 |16:39
dprincederekh: oh, maybe I got mixed up16:39
slagleoh, that sounds like the bug i just filed16:39
trownslagle: dprince, from #rdo [11:38:38] <apevec> trown, we'll not release without tripleo16:39
slaglederekh: https://bugs.launchpad.net/tripleo/+bug/155324316:40
openstackLaunchpad bug 1553243 in tripleo "CI: Resource CREATE failed: Error: resources.ComputeAllNodesValidationDeployment.resources[0]" [Critical,In progress] - Assigned to James Slagle (james-slagle)16:40
slaglesame thing looks like16:40
dprinceslagle: yea, your is different16:40
slaglei think i saw this a few times before the testenv redeploy...so not sure it's related to that16:40
dprinceslagle: sorry, your is the same issue I hit w/ Ceph16:40
trownslagle: dprince, however the goal of RDO is always to release within 2 weeks of GA and I would not like to hold that up if possible16:40
derekhslagle: dprince yup, thats the issue I'm looking at16:41
derekhtrown: Do you also know when an initial Mitaka branch will be available, although not necessarily released16:43
trownderekh: in what context? we could create rdo-mitaka packaging branches now if we wanted to16:43
trownderekh: you mean for the rest of openstack? I would guess as soon as RCs start popping up16:44
derekhtrown: ok16:44
*** yamahata has quit IRC16:44
dtantsurderekh, trown, for openstack: oslo libraries are getting mitaka branches pretty soon, then clients. services after they get the 1st RC16:45
derekhtrown: the context I was wondering about is when every we do create branches we would want to use a mitaka repository to test them against16:45
*** yamahata has joined #tripleo16:45
trownwhich not being able to deploy the rest of openstack is a clear blocker for release16:45
*** penick has quit IRC16:46
trownderekh: I think that delorean trunk is fine for that16:46
dprinceslagle: I'm still wondering if there is a subtle cleanup bug that is causing 155324316:47
dprinceslagle: testenv-cleanup but16:47
dprincebug16:47
slagledprince: oh like another vm is holding onto the IP?16:47
dprinceslagle: or a bridge or something16:47
slagleyea could be16:47
derekhtrown: maybe, couldn't we get bitten by projects removing deprecated code for example16:48
openstackgerritCarlos Camacho proposed openstack/tripleo-heat-templates: Remove forced rabbitmq::file_limit conversion to string  https://review.openstack.org/23298316:48
slagletrown: i'm not sure about testing our mitaka branches against trunk delorean, seems like we'd just have all the issues we currently do with trunk chasing16:48
derekhwe would need a mitaka repository to make sure the api we talk to stay stable16:49
trownslagle: derekh, ah I see, well as soon as we start seeing mitaka RCs we can branch delorean16:49
trownbecause just like liberty it will try to build the configured release (mitaka in this case) and fall back to master if there is not a stable/mitaka branch yet16:50
derekhtrown: cool, makes sense,16:50
slaglesounds reasonable16:50
slaglethat will at least give us some stability16:50
jistrccamacho: cool, thanks for submitting the rebase!16:51
ccamacho:) Thank you for the help!16:51
*** rcernin has quit IRC16:51
*** jcoufal has quit IRC16:53
bnemecgdi, just tripped the breaker in my office again.16:57
* bnemec should have had them wire it for 50 amps16:57
slagleshut some of those lava lamps off16:58
trownjistr, there may have been some other cruft in my environment causing ping test to fail... I just redeployed with your patch `virt-customized` onto my undercloud.qcow2 and the ping test even passed.16:58
trownwhich means we are super close to being back able to deploy trunk16:58
shardy\o/16:58
jistrneat :)16:59
trownmy yes hack for tripleo.sh is not working very well... it causes things not to hang, but instead exit 1 even though the ping test succeeded16:59
*** devvesa has quit IRC16:59
*** dsneddon has quit IRC16:59
*** jobewan has joined #tripleo17:00
bnemecslagle: It also turns out that my UPS is utterly worthless.  The last two times this has happened it just sits there beeping at me, with the battery at 100% but 0 output voltage.17:00
derekhsounds uninterruptible to me17:02
slagledid you plug your stuff into the correct side?17:02
*** dsneddon has joined #tripleo17:03
*** Goneri has joined #tripleo17:04
*** jobewan has quit IRC17:05
*** absubram has joined #tripleo17:13
*** absubram_ has joined #tripleo17:14
*** absubram has quit IRC17:18
*** absubram_ is now known as absubram17:18
*** trown is now known as trown|lunch17:20
*** fgimenez has quit IRC17:20
*** xinwu has quit IRC17:22
*** masco has quit IRC17:23
openstackgerritKarim Boumedhel proposed openstack/puppet-pacemaker: When using Rhevm stonith mechanism, fence packages could allready be referenced somewhere else in the code so such a requirement shouldnt sit in the defined type. Constraints on the stonith resources are also unecessary  https://review.openstack.org/28852717:26
openstackgerritKarim Boumedhel proposed openstack/puppet-pacemaker: puppet-pacemaker rhevm stonith fails  https://review.openstack.org/28852717:28
*** dtantsur is now known as dtantsur|afk17:32
*** yamahata has quit IRC17:33
*** mbound has joined #tripleo17:36
*** xinwu has joined #tripleo17:37
derekhdprince: you got 2 of them "is not pingable" error and a db connection error, recheck17:40
derekhslagle: is that error intermittent ?17:40
dprincederekh: yeah, this not pingable error concerns me17:40
dprincederekh: any ideas? I just reviewed the testenv's and I don't see anything jumping out at me17:41
*** gfidente has quit IRC17:42
derekhdprince: no clue, according to logstash its only happened 3 times in the last week, although it may not have caught up yet http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20pingable.%20Local%20Network%5C%2217:44
*** jistr has quit IRC17:46
slaglederekh: yes, it's transient17:47
*** jprovazn has quit IRC17:48
*** ohamada has quit IRC17:48
slaglei think i saw it ealier this week before the testenv redeploy17:48
*** dshulyak has left #tripleo17:51
*** trown|lunch is now known as trown17:53
trownslagle: any chance we can get a quick +A on https://review.openstack.org/#/c/288460 it passed CI and dprince already +217:53
trownI think that is the only trunk blocker17:53
slaglethere's always a chance man17:53
trown:)17:53
*** lucasagomes is now known as lucas-beer17:54
derekhtrown: I'm just after remember something you said earlier about not knowing how trunk nonha passed17:57
openstackgerritMerged openstack/tripleo-heat-templates: Run keystone-manage bootstrap for HA deployment too  https://review.openstack.org/28846017:57
slaglein general our CI jobs seem much slower after the testenv redeploy17:57
slagleis it just me imagining things?17:57
trownderekh: I think it is because ansible is getting a tty17:58
derekhtrown: looks like the pingtest passed, then failed to delete the tenant stack17:58
derekh2016-03-04 08:27:00.894 | tripleo.sh -- Overcloud pingtest - time out waiting to delete tenant heat stack, please check manually17:58
trownderekh: ya, that is different, and fixed by a neutronclient patch today17:58
derekhtrown: ok17:58
trownderekh: RDO CI was hanging waiting for 'y' confirmation to heat stack-delete17:59
derekhtrown: got ya17:59
trownderekh: my super elegant fix :p : https://github.com/redhat-openstack/tripleo-quickstart/commit/447f127f34dbf4069937581c11d2698ac116fd3018:00
derekhslagle: I hadn't noticed but maybe, looks like you could be correct, once a few more jobs finish we'll have a better idea18:00
bnemecslagle: The patch I'm watching just got to introspection, and it's been running for two hours. :-/18:01
*** mbound has quit IRC18:02
slaglederekh: we did land the patch to enable swap, but a few jobs ran wiht that before the testenv deploy in their normal times18:02
slagleiirc18:02
slaglethe swap enablement itself takes about 3 minutes18:02
slaglebut if we are now heavily going into swap...that could explain the slow down18:02
*** tosky has quit IRC18:02
*** electrofelix has quit IRC18:03
*** yamahata has joined #tripleo18:04
*** ccamacho has quit IRC18:06
derekhslagle: ok, it could be any number of things taking up the time, a profile comparison of a fast test vs's a slow one could help to find out18:08
*** shivrao has joined #tripleo18:10
*** derekh has quit IRC18:11
bnemecUgh, our undercloud log tar went from ~7  MB yesterday to 30 today.18:13
bnemecDid RDO start logging to systemd today or something?18:13
bnemecThe journal files are taking up most of the space.18:14
trownall the openstack services have been logging to the journal since I started working on openstack18:20
*** mbound has joined #tripleo18:20
openstackgerritBen Nemec proposed openstack-infra/tripleo-ci: Enable undercloud ssl on nonha job  https://review.openstack.org/27374318:20
trownwhich was not yesterday :P18:20
*** rbrady has quit IRC18:21
*** mbound has quit IRC18:24
openstackgerritSteven Hardy proposed openstack/tripleo-common: tripleo.sh - build puppet modules from source for stable branches  https://review.openstack.org/28862618:25
*** shakamunyi has joined #tripleo18:29
*** tzumainn has quit IRC18:29
*** tzumainn has joined #tripleo18:29
openstackgerritMerged openstack/instack-undercloud: mysqld config: set innodb_file_per_table to ON  https://review.openstack.org/28522718:31
*** penick has joined #tripleo18:37
*** akrivoka has quit IRC18:38
*** cwolferh has quit IRC18:38
*** mgould has quit IRC18:40
*** pcaruana has quit IRC18:41
slaglelooking at the logs from one of the ha jobs that took 2h44m, one controller was 400mb into swap and the other 2 were ~100mb into swap18:42
slaglethat doesnt seem so bad18:42
slaglebut maybe it is18:42
*** xinwu has quit IRC18:43
slagleundercloud 1.2G into swap18:43
slaglewait, how much ram should the underclouds have? 5gb or 6gb?18:45
slaglei thought we bumped it to 618:45
slagleor did we only bump it to 5?18:45
*** xinwu has joined #tripleo18:46
slaglethey only have 5, and i think that's a big part of the problem, they are using too much swap18:46
bnemecslagle: Only 5, but there are 2 GB of swap now.18:46
slaglei'm on one right now18:46
slagleand it slow as molasses18:46
slaglenova list takes 2 minutes18:46
bnemecFor some reason, swift-proxy sometimes seems to cache all of the images in memory, so it ends up eating over 1 GB of memory itself.18:46
bnemecFor a service we barely use. :-/18:46
ayoungIf I did a tripleo undercloud install, how can I redo it without blowing away the instack vm?18:49
*** xinwu has quit IRC18:50
*** athomas has quit IRC18:51
slaglein theory, you can just rerun the installer18:53
*** ccamacho has joined #tripleo18:53
slaglebut there is a bug right now preventing that18:53
trownthere are also some things that are not redoable18:55
trownsince not everything is puppet18:55
slagletrown: you mean they fail when you rerun them?18:56
bnemecI have some commands that attempt to wipe an installed undercloud without rebuilding it, but they only work about 50% of the time in my experience.18:56
trownslagle: no, they just don't change18:56
slagletrown: oh i see18:56
bnemecIt's what I do when I get to the point of needing to rebuild, but don't want to completely start from scratch.18:56
trownI am failing to think of an example, but I have hit it before18:56
slagleyea, the coveted openstack uninstaller18:57
slaglepractically the most requested rfe18:57
trownif only there were a way to quickly deploy from a pre-built qcow218:57
slagleif only :)18:57
bnemecYeah, cause that wouldn't run afoul of idempotency issues... ;-)18:58
trownbnemec: if the undercloud has not been installed yet, and is just all the packages it wont :)18:59
trownbnemec: tripleo-quickstart on my dell mini goes from nothing to ready to deploy in about 10 min18:59
*** isq has joined #tripleo19:09
trowndib elements are a pretty awful way to install packages :), so many yum transactions19:09
dprinceI'm +2 on the initial IPv6 patch. Tested it locally....19:10
dprincehttps://review.openstack.org/#/c/235423/19:10
*** mburned is now known as mburned_out19:10
*** xinwu has joined #tripleo19:10
*** xinwu has quit IRC19:13
*** ccamacho has quit IRC19:16
EmilienMwe miss a last +A now :)19:21
EmilienMdprince: we need to land https://review.openstack.org/#/c/278979/ first19:22
*** ccamacho has joined #tripleo19:23
bnemectrown: From what I hear, puppet isn't any better.  And actually, if we would have ever bothered to convert all of the package installs in dib to the new methods, I think it would be better.19:25
dprinceEmilienM: done19:25
bnemecThe problem is we have a bunch of elements still doing independent installs.  I believe the non-deprecated method actually rolls all of the installs into one so you don't have to run yum once per element.19:26
EmilienMdprince: thx19:26
trownbnemec: ah that would be better19:28
trownbnemec: for building the undercloud.qcow2 for RDO, I just have a big list of every package and pre-install them19:28
openstackgerritMerged openstack/tripleo-heat-templates: Allow for usage of pre-allocated IPs for the management network  https://review.openstack.org/27897919:29
bnemectrown: Yeah, at some point we should audit the elements we use and make sure they all get converted to https://github.com/openstack/diskimage-builder/tree/master/elements/package-installs19:30
*** ccamacho has quit IRC19:33
*** ccamacho has joined #tripleo19:34
*** jcoufal has joined #tripleo19:36
*** panda has quit IRC19:38
*** panda has joined #tripleo19:38
*** ccamacho has quit IRC19:40
EmilienMbnemec: can you look https://review.openstack.org/#/c/235423/ when you have time?19:40
*** tzumainn has quit IRC19:41
trownbnemec: ya, there are also a bunch of elements that have a single line of bash, which seems an inefficient way to construct an image too19:46
trownsome of them are just hiding packaging bugs: https://github.com/redhat-openstack/tripleo-quickstart/blob/master/playbooks/roles/images/build/templates/dib-workaround-default.sh.j219:48
*** mburned_out is now known as mburned19:50
dprinceslagle: could this issue be related to the swap file?19:51
trown0 to "ping test validated HA overcloud" in 35 min. http://fpaste.org/334085/45712108/19:51
slagledprince: which? :)19:52
dprinceslagle: sorry, this one https://bugs.launchpad.net/tripleo/+bug/155324319:52
openstackLaunchpad bug 1553243 in tripleo "CI: Resource CREATE failed: Error: resources.ComputeAllNodesValidationDeployment.resources[0]" [Critical,In progress] - Assigned to James Slagle (james-slagle)19:52
*** shivrao has quit IRC19:52
dprinceslagle: we just turned that on yesterday right?19:52
slagledprince: i think we turned it on wednesday19:53
slaglerelated? i don't see how, but anything is possible if things are just running really slow19:53
slaglei don't think we would have gone into swap during the validationdeployment19:54
slagleas none of openstack is started yet19:54
slaglethings are just crawling in CI, it could be the swap file19:55
dprinceslagle: yeah, super slow19:55
slaglebut even now, the job i'm looking at, it's only at initially deploying the oc, and the undercloud has a load of 2019:55
slaglethe swap change did merge yesterday actually, https://review.openstack.org/#/c/286793/19:57
dprincetrown: back in the day I shot for 25 minutes w/ smokestack19:57
dprincetrown: and I had it under 20 at times19:57
trowndprince: nonha can do that :)19:57
dprincetrown: yeah, I wasn't using HA19:57
trownha on a 32G host is just tight resource-wise19:57
slaglebut if you look at the CI on that patch, the times are normal19:57
*** weshay has quit IRC19:57
dprinceslagle: could be because it was just a single run using it19:58
dprinceslagle: and now in parallel it is getting messy19:58
dprinceslagle: it is hard to say, we've had so much chnged this week. Like I rebuilt the testenv's yesterday19:58
dprinceslagle:  new code... plus this swap patch19:59
slaglewe could try backing it out, but we'd have to revert the aodh patch as well19:59
slagleotherwise, nothing will pass19:59
dprinceslagle: yeah. I'm gonna say aodh isn't the priority right now.20:00
dprinceslagle: I get everyone wants their patch in, but we are in a bad place at the moment20:00
EmilienMI don't know why we need ceilometer & aodh on the undercloud20:01
EmilienMI don't see any use case20:01
slagleEmilienM: this was the overcloud patch20:01
slagleit drove up memory just enough so that stuff was getting oom killed20:02
EmilienMwe need to reduce workers to 2 for all we can20:02
EmilienMwe had this problem in puppet CI (at lower scale ok) but we managed with reducing workers to 220:03
EmilienMnot sure how much workers are set by default for openstack services20:03
trownEmilienM: I think we already reduce them all to 120:03
EmilienMon both uc/oc? for all services?20:03
trownjust oc I think, looking for where that heat environment is20:04
openstackgerritJames Slagle proposed openstack/tripleo-heat-templates: WIP: Revert "Deploy Aodh services, replacing Ceilometer Alarm"  https://review.openstack.org/28871420:05
dprinceEmilienM: we may be able to reduce some workers... but stevebaker specifically wanted Heat engine at 4 workers for us in the undercloud. That is a Gig20:05
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: WIP: Revert "Use swapfile environment in CI"  https://review.openstack.org/28871620:06
slaglei guess we can see what happens with that ^^20:06
dprinceslagle: I can swapoff the testenv's myself20:06
dprinceslagle: that could be part of the issue20:06
*** mburned is now known as mburned_out20:06
trownEmilienM: found it https://github.com/openstack-infra/tripleo-ci/blob/master/test-environments/worker-config.yaml20:06
EmilienMthe swap is only useful when you have SSD imho20:06
*** derekh has joined #tripleo20:06
dprinceslagle: derekh just enable swap yesterday I think20:07
EmilienMif you enable swap on slow disk, things can be worse than before20:07
dprinceEmilienM: agree20:07
EmilienMtrown: thanks! I was trying to find it, you are faster :-P20:07
* dprince is ready to run swappoff on all the nodes20:07
derekhslagle: was thinking about what you mentioned while at dinner, something occured to me, popping on to mention it20:07
derekhdprince: stop20:07
trownEmilienM: it moved, had to look up the review that moved it20:07
dprincederekh: stopped20:08
derekhdprince: slagle 2016-03-04 16:41:13.369 | + ./testenv-client -b 192.168.1.1:4730 -t 10200 -- ./toci_instack.sh20:08
slagle i hope i wasn't drunk at this dinner20:08
derekh2016-03-04 17:44:04.691 | 2016-03-04 17:44:04,663 - testenv-client - INFO - Received job : {20:08
derekhthe jobs are wait up to an hour to get testenvs20:08
derekhdprince: so we got a shortage to testenvs20:08
*** rbrady has joined #tripleo20:08
dprincederekh: I just created 1 more20:08
derekhdprince: I assume you were about to swapoff the testenvs?20:08
EmilienMtrown: 1 worker is a bit low imho - 2 would be ideal20:09
EmilienMtrown: 1 worker might expose you to some api timeouts20:09
dprincederekh: IPMI isn't working on 3-4 of them (we need to get them fixed)20:09
EmilienMdo we have api timeouts?20:09
dprincederekh: I was going to try to swapoff20:09
derekhdprince: they arn't running slowley,we just havn't enough20:09
trownEmilienM: I have not hit that, I use the same environment in RDO CI20:10
dprincederekh: the reason why is I just tried it again on testenv18 (my testbed) and it failed there with this ComputeAllNodesValidation error20:10
dprincederekh: I didn't see that yesterday20:10
dprincederekh: or last night rther20:10
*** Marga_ has joined #tripleo20:10
trowndprince: its not the python-ipaddr package missing is it?20:11
trowndprince: I hit that after we set -e the validation script20:11
bnemectrown: So the issue I'm having with deleting heat stacks seems to only happen with network isolation enabled.  Were you using it when you had problems?20:11
derekhdprince: so maybe we're talking about two different things, I'm talking about that fact that slagle mentions jobs were running slower since the rebuild20:11
dprincetrown: I had another patch to fix that via overcloud-base in tripleo-puppet-elements. It landed20:11
derekhdprince: and now are timing out completly (all jobs)20:11
trownbnemec: nope, my issues were sans net iso20:11
dprincederekh: perhaps 2 seperate issues20:12
derekhdprince: the reason for that is that all of them are waiting for over an hour to get a testenv20:12
pradkslagle, is the aodh patch causing the gate failure? it was passing yesterday though?20:12
dprincederekh: the slowness isn't helping debug the other one20:12
dprincederekh: I just brought up the swap change as it was also new this week20:12
*** weshay has joined #tripleo20:12
bnemectrown: Okay, so my problem is most likely a different bug.  I'll open another one to separate my issue from yours.20:13
slaglepradk: it's just a theory that enabling swap slowed things down20:13
derekhdprince: we only have 31 testenvs, we need about 5020:13
derekh[heat-admin@testenv18-testenv0-avobrytykidu ~]$ nc 192.168.1.1 4730 | grep lock20:13
derekhstatus20:13
derekhlockenv 31      31      4820:13
trownbnemec: my issue was fixed with the python-neutronclient patch that landed btw20:13
dprincederekh: I can fire up some more20:14
bnemectrown: Yeah, I'm not running against trunk though, and I think my error is internal to Heat.20:14
trownah k, probably different issue then20:14
dprincederekh: but I think I'd still like to consider dropping the swap for now20:14
derekhdprince: that should fix all these timeouts we're getting http://tripleo.org/cistatus.html20:14
pradkslagle, ah ok, so since this depends on swap patch, we're reverting this too?20:14
derekhdprince: why? whats wronge with the swap?20:14
derekhdprince: we've had it for months20:15
slaglederekh: different patch20:15
slaglewe enabled swap on the oc nodes20:15
dprincederekh: yeah, for aodh20:15
derekhdprince: slagle sorry, I'm still talking about testenvs, do what ye want with the other swap ;-)20:16
slaglelet's see where the more testenv's gets us20:16
dprincederekh: I'll fire up a few more now20:16
jpeelerthrash: did you have any plans on pushing this review through - https://review.openstack.org/#/c/235569/20:16
dprincederekh: like I said old nova baremetal isn't playing nicely with all the IPMI's so its taking some time20:17
derekhdprince: cool, jobs should at least stop timing out then and you can see real errors20:17
thrashjpeeler: I'm trying. :)20:17
derekhdprince: yup, its a pain in the ass20:17
derekhdprince: slagle ok I gotta run, just jumped in to mention what I noticed, ttyl20:17
slaglethx20:18
*** derekh has quit IRC20:18
openstackgerritBrad P. Crochet proposed openstack/tripleo-common: Build image files from definitions in yaml  https://review.openstack.org/23556920:19
*** jcoufal has quit IRC20:19
jpeelerthrash: ok. if you're able to get your patch in and then i get mine in, it'll save me a lot of documentation work! thanks for staying on it20:20
thrashjpeeler: fixed the requirement (thought I had already TBH)20:20
*** rbrady_ has joined #tripleo20:20
*** rbrady has quit IRC20:21
*** xinwu has joined #tripleo20:24
jayganyone around who could give a second +2 to a small puppet-tripleo backport? https://review.openstack.org/#/c/287974/ slagle approved the version for master yesterday20:29
bnemectrown: Ah, with trunk Heat I can delete my overcloud again.  So whatever bug I'm hitting in tripleo-current seems to be fixed already.20:29
trownnice20:30
slaglebnemec: i was just about to ask if you solved that yet20:30
slagleso just update to current?20:30
bnemecslagle: I added ",openstack-heat-common,openstack-heat-api,python-heatclient,openstack-heat-api-cfn,openstack-heat-engine,openstack-heat-templates" to includepkgs in delorean-current.repo20:31
bnemecThe yum update openstack-heat-engine.20:31
*** thrash is now known as thrash|brb20:31
bnemec*Then20:32
slaglethanks, will try20:32
jaygthanks slagle!20:33
*** bvandenh has quit IRC20:33
*** jtomasek has quit IRC20:33
openstackgerritMerged openstack/puppet-tripleo: loadbalancer: fix Redis timeout HAproxy config  https://review.openstack.org/28797420:36
*** weshay has quit IRC20:39
*** rbrady_ has quit IRC20:42
*** ccamacho has joined #tripleo20:42
stevebakerdprince, EmilienM: what we really need is undercloud heat-engine workers to be tuned to the expected size of the overcloud. Also raising the number of workers happened before we discovered the rpc timeout regressions so some of those failures on small (1 core) underclouds may not have been deadlocks20:44
dprincestevebaker: think we might should try to go back down to 2 for a default?20:45
stevebakerdprince: the default is unset, which creates a worker per core20:46
dprincestevebaker: oh, right. so maybe we do just need to pin lower than that if we want/need to20:47
dprincestevebaker: for CI20:47
stevebakerdprince: I mean now the default is max(4, cores)20:49
openstackgerritBrad P. Crochet proposed openstack/python-tripleoclient: Upgrades: Add 'stack upgrade' command  https://review.openstack.org/28660620:50
openstackgerritBrad P. Crochet proposed openstack/python-tripleoclient: Upgrades: Add --post option to 'stack upgrade'  https://review.openstack.org/28872820:50
stevebakerdprince: It used to be (cores), maybe upstream could change to max(2, cores)20:50
dprincestevebaker: exactly20:51
stevebakerdprince: or instack-undercloud should just infer an appropriate value to set heat.conf num_engine_workers to20:51
stevebakerwith its knowledge of memory/cores20:52
stevebakerand its certainty that an undercloud is a single host heat20:52
*** weshay has joined #tripleo20:53
stevebakerdprince: its probably not appropriate to set the upstream heat default to what undercloud needs - its not exactly a typical production heat setup20:53
dprincestevebaker: I agree there20:54
stevebaker(single stack, single host)20:54
openstackgerritRyan Hallisey proposed openstack/tripleo-heat-templates: Parameterize the heat-docker-agents image  https://review.openstack.org/28873120:54
stevebakerdprince: I would suggest CI set num_engine_workers to 2 via ansible, and keep an eye on rpc timeouts20:56
trownhehe at "via ansible"20:56
trownvia bash I think20:56
stevebakertrown: oh, I assumed there was ansible all over20:57
trownRDO and downstream yes... tripleoci not so much20:57
stevebakeralso is there memory pressure on the small overcloud controllers?20:57
stevebakercrap, puppet-heat doesn't have an option for num_engine_workers20:58
*** ccamacho has quit IRC20:58
trownstevebaker: ya have to use heat::config::heat_config20:59
stevebakertrown: ok, that could always be set 1 overcloud heat-engine worker per controller (or whatever the tempest tests need to run)21:00
*** cwolferh has joined #tripleo21:01
stevebakertrown, dprince: does a ci undercloud really need 13 nova-api and 6 nova-conductor?21:02
dprincestevebaker: no, where do you see those?21:02
stevebakerdprince: my random undercloud21:02
stevebakerdprince: the last puddle on a 8G 4 core vm21:03
trownwe could probably tune those down by default too21:05
*** thrash|brb is now known as thrash21:13
openstackgerritBrad P. Crochet proposed openstack/tripleo-common: Upgrades: Add post-upgrade stack update  https://review.openstack.org/28874421:15
*** penick has quit IRC21:16
shardyWe do already tune these workers somewhat for the overcloud: https://review.openstack.org/#/c/273431/8/toci_instack.sh21:17
shardye.g in CI21:17
shardyanyway, night all, have a good weekend :)21:17
*** dprince has quit IRC21:18
bnemecYou too, shardy21:18
*** shardy has quit IRC21:18
openstackgerritBrad P. Crochet proposed openstack/python-tripleoclient: Upgrades: Add --post option to 'stack upgrade'  https://review.openstack.org/28872821:20
openstackgerritJames Slagle proposed openstack/tripleo-heat-templates: Make AllNodesExtraConfig depend on the validation deployments  https://review.openstack.org/28874721:22
*** dmsimard has quit IRC21:23
*** shivrao has joined #tripleo21:25
*** penick has joined #tripleo21:38
*** jayg is now known as jayg|g0n321:40
*** rlandy has quit IRC21:43
*** rlandy has joined #tripleo21:44
*** r-mibu has quit IRC21:47
*** r-mibu has joined #tripleo21:47
*** mburned_out is now known as mburned21:50
*** gfidente has joined #tripleo21:55
openstackgerritRichard Su proposed openstack/tripleo-heat-templates: Store events in Ceilometer  https://review.openstack.org/28756121:58
*** pcaruana has joined #tripleo21:58
gfidenteyeh! I'm going to get ipv6 working, no matter what21:59
*** lblanchard has quit IRC21:59
*** cwolferh has quit IRC22:02
*** cwolferh has joined #tripleo22:04
*** rlandy has quit IRC22:04
*** dustins has quit IRC22:10
*** weshay has quit IRC22:13
Erming_trown: are you there. a question about the overcloud:22:24
Erming_trown: why after the deployment, almost all the services are inactive (disabled on boot) except for the swift ones?22:25
trownErming_: sorry, I have to run, but if it is an HA or pacemaker setup, it is because pacemaker is managing the services not systemd22:26
*** trown is now known as trown|outtypewww22:26
Erming_trown: Thanks22:26
Erming_trown|outtypewww: Have a great weekend (hope you could see it though :-)22:27
*** rhallisey has quit IRC22:42
*** penick has quit IRC22:48
*** dsneddon has quit IRC22:54
*** xinwu has quit IRC22:56
*** tiswanso has quit IRC22:58
*** xinwu has joined #tripleo22:59
*** xinwu has quit IRC22:59
*** Goneri has quit IRC23:00
*** xinwu has joined #tripleo23:02
*** xinwu has quit IRC23:02
*** morazi has quit IRC23:04
*** rwsu has quit IRC23:08
*** rwsu has joined #tripleo23:09
*** mbound has joined #tripleo23:13
*** Goneri has joined #tripleo23:16
*** trozet has quit IRC23:32
*** jobewan has joined #tripleo23:35
*** jdob has quit IRC23:36
gfidenteEmilienM, you around?23:50
*** mbound has quit IRC23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!