*** ayoung has quit IRC | 00:07 | |
openstackgerrit | Ben Nemec proposed openstack/tripleo-common: Add capabilities filter for Nova https://review.openstack.org/288087 | 00:11 |
---|---|---|
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Configure nova to use custom scheduler filter https://review.openstack.org/288188 | 00:14 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Configure nova to use custom scheduler filter https://review.openstack.org/288188 | 00:20 |
*** lblanchard has joined #tripleo | 00:21 | |
*** saneax is now known as saneax_AFK | 00:50 | |
*** ayoung has joined #tripleo | 01:05 | |
*** mbound has quit IRC | 01:16 | |
*** mbound has joined #tripleo | 01:17 | |
*** dprince has joined #tripleo | 01:18 | |
*** shivrao has quit IRC | 01:19 | |
*** yamahata has quit IRC | 01:37 | |
*** shivrao has joined #tripleo | 01:46 | |
*** cwolferh has quit IRC | 01:46 | |
*** dmsimard has quit IRC | 01:50 | |
*** dmsimard has joined #tripleo | 01:51 | |
*** yuanying has quit IRC | 01:58 | |
*** yuanying has joined #tripleo | 01:59 | |
*** trozet has joined #tripleo | 02:08 | |
*** mbound has quit IRC | 02:12 | |
*** cwolferh has joined #tripleo | 02:13 | |
openstackgerrit | Dan Prince proposed openstack-infra/tripleo-ci: WIP: IPv4 network isolation testing https://review.openstack.org/288163 | 02:15 |
*** tiswanso has quit IRC | 02:19 | |
*** tiswanso has joined #tripleo | 02:20 | |
*** shivrao has quit IRC | 02:33 | |
*** dprince has quit IRC | 02:40 | |
*** mburned is now known as mburned_out | 02:42 | |
*** links has joined #tripleo | 03:01 | |
*** lblanchard has quit IRC | 03:07 | |
*** trozet has quit IRC | 03:08 | |
*** yuanying has quit IRC | 03:16 | |
openstackgerrit | Richard Su proposed openstack/tripleo-heat-templates: Store events in Ceilometer https://review.openstack.org/287561 | 03:25 |
*** rhallisey has quit IRC | 03:27 | |
openstackgerrit | Richard Su proposed openstack/instack-undercloud: Store events in Undercloud Ceilometer https://review.openstack.org/286734 | 03:29 |
EmilienM | can I have a review on https://review.openstack.org/#/c/286584/ please ? | 03:29 |
*** lblanchard has joined #tripleo | 03:33 | |
*** panda has quit IRC | 03:38 | |
*** panda has joined #tripleo | 03:38 | |
*** yamahata has joined #tripleo | 03:41 | |
*** links has quit IRC | 03:44 | |
*** cwolferh has quit IRC | 04:06 | |
*** yuanying has joined #tripleo | 04:08 | |
*** yuanying has quit IRC | 04:12 | |
openstackgerrit | Michael Chapman proposed openstack/tripleo-heat-templates: Adds OpenDaylight support https://review.openstack.org/200253 | 04:15 |
*** lblanchard has quit IRC | 04:19 | |
*** links has joined #tripleo | 04:27 | |
*** openstack has joined #tripleo | 04:29 | |
*** lynxman has quit IRC | 04:29 | |
*** yuanying has joined #tripleo | 04:30 | |
*** dtantsur|afk has quit IRC | 04:30 | |
*** xinwu has quit IRC | 04:30 | |
*** openstack has quit IRC | 04:30 | |
*** openstack has joined #tripleo | 04:32 | |
*** dtantsur has joined #tripleo | 04:32 | |
*** openstack has quit IRC | 04:34 | |
*** openstack has joined #tripleo | 04:35 | |
*** lynxman has quit IRC | 04:36 | |
*** cwolferh has joined #tripleo | 04:36 | |
*** openstack has quit IRC | 04:36 | |
*** openstack has joined #tripleo | 04:37 | |
*** tiswanso has quit IRC | 04:38 | |
*** openstack has quit IRC | 04:39 | |
*** openstack has joined #tripleo | 14:06 | |
*** tiswanso has joined #tripleo | 14:07 | |
trown | jistr: thanks a ton for all your help | 14:08 |
*** mandre has joined #tripleo | 14:10 | |
openstackgerrit | Merged openstack/tripleo-heat-templates: Set notification driver for nova to send https://review.openstack.org/283686 | 14:13 |
dprince | gfidente: can you join here https://redhat.bluejeans.com/u/dprince/ | 14:14 |
*** Goneri has joined #tripleo | 14:14 | |
*** tzumainn has joined #tripleo | 14:16 | |
openstackgerrit | Merged openstack/tripleo-common: Override OS::Nova::Server for user_data updates https://review.openstack.org/284769 | 14:18 |
*** jprovazn has joined #tripleo | 14:18 | |
gfidente | https://github.com/openstack/os-cloud-config/blob/master/os_cloud_config/keystone.py#L546 | 14:20 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/os-collect-config: Updated from global requirements https://review.openstack.org/264446 | 14:21 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/python-tripleoclient: Updated from global requirements https://review.openstack.org/268528 | 14:22 |
slagle | jistr: trown : i think you might need to base https://review.openstack.org/#/c/288460 on https://review.openstack.org/#/c/244162 | 14:23 |
jistr | trown: hey, i'm back from the call, saw your comment on the patch | 14:23 |
trown | slagle: hmm... but why would nonha pass without that? | 14:24 |
jistr | slagle: i think we're trying to just get the Default keystone domain back, getting the full keystone init via puppet in is orthogonal i think | 14:24 |
*** mandre has quit IRC | 14:24 | |
trown | on a positive note looks like the heat stack-delete issue is resolved | 14:24 |
jistr | slagle: it was previously created by db sync, now we need to run the bootstrap | 14:24 |
jistr | slagle: https://github.com/openstack/keystone/blob/f699ca93fc6f2485ec8e76e907572a2f838cd3cb/releasenotes/notes/no-default-domain-2161ada44bf7a3f7.yaml | 14:25 |
slagle | yea but the ha job was failing while running the bootstrap, https://review.openstack.org/#/c/286352/ | 14:25 |
slagle | that's why it was disabled | 14:25 |
*** dustins has joined #tripleo | 14:25 | |
slagle | only for ha, nonha passed fine | 14:26 |
jistr | slagle: yeah because it tried to run too early | 14:26 |
jistr | (i think :) ) | 14:26 |
trown | although I think the new heatclient feature to prevent me from accidentally deleting my stack is still causing the pingtest cleanup to hang... really confused how the tripleoci periodic job could get past that | 14:26 |
slagle | jistr: ok | 14:26 |
openstackgerrit | Jiri Stransky proposed openstack/tripleo-heat-templates: Run keystone-manage bootstrap for HA deployment too https://review.openstack.org/288460 | 14:30 |
jistr | trown: it *might* have been a race condition indeed. Could you please try again with ^^ ? | 14:31 |
jistr | added explicit ordering | 14:31 |
trown | jistr: sure thing... now that I can cleanly delete stacks retrying is so much nicer | 14:31 |
*** leanderthal is now known as leanderthal|mtg | 14:31 | |
ayoung | So...undercloud neutron uses 192.0? | 14:33 |
ayoung | for the control plane. | 14:33 |
ayoung | And, I know, I know...there are no ranges we can count on not being used | 14:33 |
openstackgerrit | Dmitry Tantsur proposed openstack/python-tripleoclient: [WIP] Allow 'openstack baremetal configure boot' to guess the root device https://review.openstack.org/288417 | 14:33 |
ayoung | but that one...its even more reserved than the rest | 14:33 |
jistr | slagle: or actually it might have failed previously just because it tried to run on all nodes instead of on $pacemaker_master... going to look if the bootstrap uses API calls or directly goes to DB | 14:35 |
*** openstackgerrit_ has quit IRC | 14:36 | |
trown | jistr: I had problems earlier with the boostrap running on all nodes | 14:37 |
shardy | ayoung: yes, see undercloud.conf, the provisioning network defaults to 192.0.2.0/24 | 14:37 |
*** openstackgerrit_ has joined #tripleo | 14:37 | |
trown | I think it goes directly to DB | 14:37 |
ayoung | shardy, so, this is kindof a violation of the RFC: | 14:37 |
ayoung | shardy, that is not a usable local, non routable. | 14:38 |
ayoung | shardy, and, of course, IPA enforces that. Which puts me in a tricky position trying to integrate | 14:38 |
shardy | ayoung: perhaps we should default to a different subnet, but you can control it via undercloud.conf to be whatever you want | 14:39 |
shardy | it does appear to conflict with rfc5737, you're right | 14:40 |
ayoung | shardy, there are no good answers with IPv4. We need IPv6, which should be usable by now. | 14:40 |
*** mkovacik has quit IRC | 14:40 | |
ayoung | shardy, yeah, found that a midnight last night | 14:40 |
ayoung | shardy, I was not happy | 14:40 |
*** mandre has joined #tripleo | 14:40 | |
ayoung | editing the python of an installed RPM.... | 14:40 |
shardy | ayoung: why editing python? | 14:41 |
ayoung | shardy, heh | 14:41 |
ayoung | shardy, to remove the check in IPA of course | 14:41 |
trown | ah ha, the periodic job is hitting the new heatclient "feature": 2016-03-04 08:27:00.894 | tripleo.sh -- Overcloud pingtest - time out waiting to delete tenant heat stack, please check manually | 14:41 |
shardy | like I said, you should be able to configure it via undercloud.conf | 14:41 |
shardy | otherwise we have two bugs ;) | 14:41 |
*** links has quit IRC | 14:41 | |
ayoung | shardy, yeah...I'm just learning this flow. Let me see... | 14:42 |
shardy | ayoung: can you raise a launchpad bug with your findings please, then we can discuss there and figure out the best plan? | 14:42 |
ayoung | shardy, will do | 14:42 |
jaosorior | ayoung: lol, done that | 14:42 |
jaosorior | (editing the code from an installed rpm, it's nasty... but gotta do what you gotta do) | 14:42 |
jistr | trown: keystone bootstrap doesn't do API calls, and it should probably run before keystone is started. So what i have in the patch is probably wrong, but still there's a possibility that it could succeed, so i wouldn't cancel the test just yet. It probably belongs to step 3 indeed, just on $pacemaker_master i guess though. /cc slagle | 14:42 |
shardy | ayoung: if you're using tripleo.sh then it's probably using the defaults for everything | 14:43 |
shardy | cp /usr/share/instack-undercloud/undercloud.conf.sample ~/undercloud.conf then hack away, and you should be able to refine the addresses to match what you need until we fix the defaults | 14:43 |
trown | jistr: ok if this fails I will try on step 3 | 14:44 |
openstackgerrit | Pradeep Kilambi proposed openstack/tripleo-heat-templates: Set notification driver for nova to send https://review.openstack.org/288497 | 14:45 |
ayoung | shardy, https://bugs.launchpad.net/tripleo/+bug/1553222 | 14:45 |
openstack | Launchpad bug 1553222 in tripleo "Default undercloud control plan network violates rfc5737" [Undecided,New] | 14:45 |
shardy | ayoung: thanks! | 14:45 |
ayoung | ah...looovely typp | 14:45 |
openstackgerrit | Jiri Stransky proposed openstack/tripleo-heat-templates: Run keystone-manage bootstrap for HA deployment too https://review.openstack.org/288460 | 14:46 |
* ayoung needs to figure out what range to use | 14:46 | |
jistr | trown: here's code for the next attempt ^^ | 14:46 |
trown | jistr: awesome current deploy is in step 5, so should fail or not shortly | 14:48 |
*** leanderthal|mtg is now known as leanderthal | 14:56 | |
bnemec | gfidente: derekh: Noticed your earlier discussion about having issues with the postconfig. I've actually run into the same sort of thing locally when I use net-iso. | 14:58 |
bnemec | I think we don't noproxy either the admin or public address, so if you have http_proxy set (like we do in CI), then it tries to go through the proxy and hangs because the proxy doesn't know about the address. | 14:58 |
derekh | ^ dprince that could be it | 14:59 |
*** mandre has quit IRC | 14:59 | |
*** jcoufal_ has quit IRC | 14:59 | |
dprince | derekh, bnemec I don't use no_proxy in my environment either | 14:59 |
derekh | bnemec: sounds fairly plausable, dprince wanna try that in your recheck | 14:59 |
dprince | derekh: yep, lets do it | 15:00 |
*** jrist has joined #tripleo | 15:02 | |
openstackgerrit | John Trowbridge proposed openstack/tripleo-common: Use 'yes' hack for ping test stack delete https://review.openstack.org/288511 | 15:02 |
openstackgerrit | Dan Prince proposed openstack-infra/tripleo-ci: WIP: IPv4 network isolation testing https://review.openstack.org/288163 | 15:05 |
*** mbound has quit IRC | 15:05 | |
dprince | bnemec: ^^ unset_http there | 15:08 |
dprince | bnemec: https://review.openstack.org/#/c/288163/3/scripts/deploy.sh | 15:08 |
bnemec | dprince: Looks good. | 15:09 |
bnemec | FWIW, the reason I haven't looked into fixing this in the client is that this is all going away once keystone init is done by puppet. | 15:10 |
*** mandre has joined #tripleo | 15:10 | |
dprince | bnemec: I know! I was seriously considering arguing we land that patch to get this working | 15:10 |
bnemec | But given how long that has taken maybe it's worth fixing anyway. | 15:10 |
dprince | bnemec: but there is still and HA issue w/ the keystone puppet stuff so step at a time | 15:10 |
bnemec | Yeah | 15:10 |
trown | WOOOOOOOOT!!! Trunk HA overcloud CREATE_COMPLETE!! jistr you rock!! | 15:14 |
jistr | ahahaha awesome :)) | 15:14 |
trown | running ping test to be sure, but I think we may have a winner | 15:14 |
jistr | trown: can you please try with step 3 (the latest version of the patch) too? i think if step 3 passes too, we should use that i think | 15:14 |
trown | jistr: only latest version of the patch works | 15:15 |
trown | jistr: PS2 did not work | 15:15 |
jistr | trown: ok cool :) | 15:15 |
trown | jistr: interestingly, what is in PS3 is in the keystone init patch, but as suspected there is other stuff there that is not working | 15:16 |
jistr | trown, slagle: i'm still not sure if running keystone bootstrap breaks upgrades, maybe it could, but i think at this point we have no choice, we gotta get the CI green | 15:16 |
slagle | jistr: i didnt think this was breaking CI? | 15:17 |
slagle | that's why the bootstrap running got disabled | 15:17 |
trown | jistr: I confirmed that the keystone-manage bootstrap command can be re-run | 15:17 |
slagle | or you're saying we need it | 15:17 |
*** mbound has joined #tripleo | 15:17 | |
trown | slagle: semantics on broken CI | 15:18 |
trown | slagle: CI is broken if we can not deploy trunk | 15:18 |
trown | it is just a hidden breakage so it does not block everyone from working | 15:18 |
trown | but it is still broken | 15:18 |
derekh | trown: more semantics on broken CI | 15:18 |
derekh | CI is not broken, trunk is | 15:18 |
slagle | oh ok, rgr that | 15:18 |
*** trozet has joined #tripleo | 15:19 | |
trown | derekh: ya, but tripleo trunk... not other project trunk | 15:19 |
*** mandre has quit IRC | 15:19 | |
slagle | well, i think that's what derekh is saying | 15:19 |
slagle | or isn't saying | 15:19 |
slagle | i dunno :) | 15:20 |
trown | ya, there are different ways to say it.... but if we have not promoted anything in over a week, tripleo is in crisis | 15:20 |
derekh | openstack trunk is broken and can't be deployed with trunk, | 15:20 |
trown | even though everyone can continue to merge stuff that may very well make the situation worse | 15:20 |
slagle | i mean if kystone and puppet-keystone pushes a change that breaks the way existing HA deployments is that really "tripleo is broken"? | 15:20 |
trown | we dont know because we are testing against the past | 15:20 |
*** jcoufal has joined #tripleo | 15:21 | |
slagle | it is in the sense, that trunk changed in a backwards incompatible way | 15:21 |
slagle | i guess that's acceptable for openstack | 15:21 |
slagle | since it happens...all the time | 15:21 |
derekh | I'n general we can just say "deploying trunk is broken", doesn't matter what project is the problem | 15:21 |
trown | right... and I think we want to be able to deploy trunk... | 15:22 |
*** mandre has joined #tripleo | 15:22 | |
trown | if not, that is a bigger problem | 15:22 |
derekh | my point here is that "ci is broken" is the term I want people to stop using, | 15:22 |
derekh | unless it actually is broken | 15:22 |
slagle | ci is working at catching failues :) | 15:23 |
trown | derekh: ok, so when dtantsur or pradk or any of the people who are trying to integrate with tripleo ask me why there patch cant pass because we are testing against 8-30 days in the past, I guess that is not CI being broken? | 15:23 |
derekh | trown: we'll its not broken, they would have the exact same problem when trying to test it locally with tripleo.sh | 15:24 |
derekh | trown: its not a problem specific to ci | 15:24 |
openstackgerrit | Giulio Fidente proposed openstack/tripleo-heat-templates: DO NOT MERGE: test CI netiso on liberty branch https://review.openstack.org/288526 | 15:24 |
gfidente | derekh, ^^ makes sense? | 15:24 |
dtantsur | so I guess we can call it "tripleo is broken" :D | 15:24 |
* dtantsur ducks | 15:24 | |
* bnemec steps away from the semantic minefield :-) | 15:25 | |
trown | this just feels like a step in the wrong direction semantically... but I agree it is a minefield | 15:25 |
derekh | our dev process it broken | 15:25 |
slagle | my spirit is broken | 15:26 |
dtantsur | EVERYTHING IS BROKEN | 15:26 |
shardy | 'cmon guys, it's FRIDAY, shall we lighten up? :) | 15:26 |
dtantsur | quickly, post owls | 15:26 |
trown | https://www.youtube.com/watch?v=fKd-07E6ses | 15:26 |
shardy | owls, yes there's a good idea :) | 15:27 |
derekh | Having said all that, ci was broken for a few hours this morning when all the testenvs ran out of RAM, so ya....that happens too | 15:27 |
openstackgerrit | Karim Boumedhel proposed openstack/puppet-pacemaker: When using Rhevm stonith mechanism, fence packages could allready be referenced somewhere else in the code so such a requirement shouldnt sit in the defined type. Constraints on the stonith resources are also unecessary https://review.openstack.org/288527 | 15:27 |
dtantsur | quite a long summary, isn't it ^^? | 15:27 |
*** yamahata has joined #tripleo | 15:28 | |
derekh | gfidente: never tied anything like that but looks sane | 15:28 |
trown | jistr: no dice on the ping test... though I am betting this is some other issue that was just hidden by not being able to get to deploy | 15:31 |
trown | jistr: I guess we will see if tripleoci passes | 15:31 |
*** mburned is now known as mburned_out | 15:33 | |
*** mburned_out is now known as mburned | 15:33 | |
jistr | trown, slagle: ok i *thought* that keystone issue was the reason CI was red, apparently i'm really confused today :D | 15:33 |
trown | jistr: it is one of the reasons the periodic CI is red | 15:34 |
*** david_lyle__ has joined #tripleo | 15:34 | |
trown | derekh: if we were deploying trunk of everything (like all other project CI), CI would be broken if we cant deploy trunk | 15:35 |
bnemec | All of the other projects are co-gating too. | 15:35 |
trown | derekh: I totally understand why that is a bad thing, and would block people getting work done, but it still feels like CI is broken if we cant deploy trunk | 15:36 |
dprince | trown: I always use trunk in my local dev environment | 15:36 |
trown | obviously I am biased given that in order for tripleo to participate in RDO we need tripleo to work with trunk | 15:36 |
dprince | trown: I think derekh usually does too | 15:37 |
*** david-lyle has quit IRC | 15:37 | |
dprince | trown: and now we have periodic jobs on trunk nightly as well | 15:37 |
openstackgerrit | Dmitry Tantsur proposed openstack/python-tripleoclient: Allow 'openstack baremetal configure boot' to guess the root device https://review.openstack.org/288417 | 15:37 |
trown | dprince: which have never passed | 15:37 |
dprince | trown: I would rather have a subset of the team focussed on keeping trunk working | 15:37 |
dprince | trown: they have passed once or twice | 15:37 |
*** panda has quit IRC | 15:37 | |
dprince | trown: http://tripleo.org/cistatus-periodic.html | 15:37 |
dprince | trown: nonha and ceph actually passed last night | 15:38 |
*** panda has joined #tripleo | 15:38 | |
trown | dprince: that is not my argument... I agree on that part of the tripleoci strategy, it is just the semantic game of "tripleoci is not broken even though it cant deploy trunk" I think says to me it is not a priority of tripleo project to work with trunk | 15:38 |
slagle | shardy: hey, in overcloud.yaml, in AllNodesValidationConfig, does the fact that Controller is used there in properties means there is an implicit depends_on? | 15:38 |
*** mbound has quit IRC | 15:38 | |
slagle | shardy: we are seeing ComputeAllNodesValidationDeployment fail a lot in CI, failing to ping the controller ip's | 15:39 |
slagle | shardy: i'm just wondering if maybe the controller isn't fully done with the NetworkDeployment | 15:39 |
trown | dprince: we cant promote on only some jobs passing... | 15:39 |
*** yamahata has quit IRC | 15:40 | |
dprince | trown: agree in general. but do note that the HA is generally unstable anyways, even with pinned CI repository | 15:40 |
dprince | trown: so we might occasionally make exceptions to this | 15:40 |
shardy | slagle: Yep, the get_attr is an implicit depends_on | 15:40 |
shardy | so the Controller ResourceGroup must be CREATE_COMPLETE before that runs | 15:41 |
slagle | ok, dang, i guess :) that was my only theory | 15:41 |
trown | dprince: https://review.openstack.org/288460 gets me to CREATE_COMPLETE with HA on trunk | 15:41 |
trown | dprince: though the ping test is failing for me now | 15:41 |
openstackgerrit | Dmitry Tantsur proposed openstack/python-tripleoclient: Allow 'openstack baremetal configure boot' to guess the root device https://review.openstack.org/288417 | 15:45 |
dtantsur | lucasagomes, ^^ | 15:45 |
dtantsur | finished with this one, I guess | 15:45 |
EmilienM | FYI puppet-ceilometer is about to drop alarming code, since it's now in Aodh. We should really consider using Aodh in TripleO from now | 15:47 |
dprince | EmilienM: I noticed aodh-evaluator is raising connection errors with network isolation enable | 15:49 |
EmilienM | dprince: interesting, do you have a trace I can look? | 15:50 |
EmilienM | pradk: ^ | 15:50 |
dprince | EmilienM: not ATM. I will get it. Just noticed it as something new... | 15:50 |
dprince | https://etherpad.openstack.org/p/tripleo-mitaka-rc-blockers | 15:50 |
dprince | jistr, marios, shardy, gfidente, jprovazn, slagle, bnemec, derekh | 15:51 |
dprince | regarding the etherpad above, did we reach agreement about the requirements for landing those code changes? | 15:52 |
dprince | This is to continue our tripleO IRC discussion from http://eavesdrop.openstack.org/meetings/tripleo/2016/tripleo.2016-03-01-14.04.log.html | 15:52 |
openstackgerrit | Ben Nemec proposed openstack/tripleo-common: Add capabilities filter for Nova https://review.openstack.org/288087 | 15:53 |
dprince | shardy: We decided to revisit this next week I think but I'm getting pings about making this decision today so I wanted to try and catch people before the weekend if possible | 15:54 |
openstackgerrit | Ben Nemec proposed openstack/instack-undercloud: Configure nova to use custom scheduler filter https://review.openstack.org/288188 | 15:56 |
openstackgerrit | Dmitry Tantsur proposed openstack/python-tripleoclient: Completely removed the old discovery image support https://review.openstack.org/288546 | 15:56 |
jaosorior | Can someone review this cr https://review.openstack.org/#/c/287199/ ? it's needed to solve this bz https://bugzilla.redhat.com/show_bug.cgi?id=1313855 | 15:56 |
openstack | jaosorior: Error: Error getting bugzilla.redhat.com bug #1313855: NotPermitted | 15:56 |
shardy | dprince: I thought we agreed manual testing would be OK, but we didn't reach consensus about whether we can push for cutting an RC at the same time as other projects | 15:56 |
shardy | https://review.openstack.org/#/c/278979/ is now ready for a second reviewer if that helps get one off the list ;) | 15:57 |
trown | I would add not being able to deploy the other RCs would be a blocker too | 15:58 |
dprince | shardy: manual testing with what though? I hear people saying they hven't used upstream TripleO for weeks.... | 15:58 |
bnemec | o.O | 15:58 |
dprince | shardy: manual testing is fine. But is also risky if we aren't all doing the same things | 15:58 |
dprince | shardy: like very few people use network iso I think | 15:59 |
bnemec | Damn, the list of must-haves has gotten quite long. :-( | 15:59 |
dprince | EmilienM: https://bugs.launchpad.net/tripleo/+bug/1553250 | 15:59 |
openstack | Launchpad bug 1553250 in tripleo "aodh-evaluator: ToozConnectionError: Error 113 connecting to 172.16.2.5:6379. No route to host." [Medium,Triaged] | 15:59 |
shardy | dprince: My impression was that we weren't going to block ipv6 and SSL on CI, but they would both get significant manual testing (with upstream code) and we'd push hard on getting the CI in place after the RC | 15:59 |
*** yamahata has joined #tripleo | 15:59 | |
bnemec | I seriously thought it was going to be IPv6 and a few other things that were mostly finished. | 15:59 |
shardy | the alternative is to block that stuff, but be forced to backport it after we branch | 16:00 |
dprince | bnemec: yeah that list grew a bit didn't it :) | 16:00 |
shardy | which will then block all the architectural rework as if we hadn't branched | 16:00 |
*** leanderthal is now known as leanderthal|afk | 16:00 | |
dprince | shardy: without CI I don't think we do architectural rework | 16:00 |
dprince | shardy: CI blocks that too | 16:01 |
shardy | well, we have CI, just not of those new features | 16:01 |
shardy | (and a bunch of other stuff, I know) | 16:01 |
dprince | shardy: right, but we still don't have CI on features from the last release | 16:01 |
shardy | dprince: Ok, what are you proposing? | 16:01 |
openstackgerrit | John Trowbridge proposed openstack/tripleo-common: Use 'yes' hack for ping test stack delete https://review.openstack.org/288511 | 16:02 |
shardy | block those features on CI, cut the RC anyway, then have another cycle where feature backports are permitted? | 16:02 |
dprince | shardy: well, I think I've just been complaining so far :) | 16:02 |
*** Goneri has quit IRC | 16:02 | |
pradk | dprince, EmilienM, i dont think that is specific to aodh-evaluator.. i noticed that in with ceilometer as well before aodh patch merged | 16:02 |
pradk | dprince, EmilienM, see ceilometer/central.log | 16:02 |
shardy | dprince: Whichever way we go, it's not great, but to me getting a clean-slate to work from for Newton is really important | 16:02 |
bnemec | dprince: Our scrum call is starting, so some of us are going to be semi-away for a few minutes. | 16:03 |
pradk | EmilienM, dprince i think its the tooz setup thats http://paste.openstack.org/show/489348/ | 16:03 |
dtantsur | related question: when is feature freeze for tripleo? | 16:03 |
shardy | dprince: perhaps we do need to clearly communicate the need for CI coverage of all new features from Newton onwards tho? | 16:04 |
jaosorior | EmilienM Thanks for the review | 16:04 |
shardy | (assuming we actually get CI capacity to handle that) | 16:04 |
EmilienM | jaosorior: you don't have to thank me to review code | 16:04 |
dprince | shardy: Yeah, I just hate to see us trash Mitaka in the meantime | 16:04 |
trown | dtantsur: never? | 16:04 |
jaosorior | fair enough | 16:04 |
dtantsur | lol | 16:04 |
dprince | slamming feature in quick without CI is a sure way to break something | 16:04 |
shardy | dprince: which are you most worried about, ipv6? | 16:05 |
lucasagomes | dtantsur, commented inline with a suggestion | 16:05 |
lucasagomes | dtantsur, lemme know what you think | 16:05 |
shardy | dprince: well, it's a sure way to land something which doesn't work or quickly breaks | 16:05 |
dprince | shardy: I'd really like the IPv4 CI job in first | 16:05 |
shardy | but I guess the question is, do we have enough coverage for confidence re regressions | 16:05 |
shardy | sounds like the answer is no | 16:05 |
shardy | dprince: Ok, what's the eta on that? | 16:06 |
dprince | shardy: we don't, IPv4 was broken for weeks. Nobody noticed | 16:06 |
dtantsur | lucasagomes, well, I don't like both what I have and what you suggest :) | 16:06 |
dprince | shardy: maybe today, maybe next week. We are closing in on it | 16:06 |
lucasagomes | dtantsur, hah | 16:06 |
lucasagomes | dtantsur, right... yeah it's tricky | 16:06 |
shardy | dprince: Ok, maybe we block on that then | 16:06 |
dtantsur | lucasagomes, I don't want people to really make lists like sda,vda,... not sure how to detect "first" better though | 16:07 |
lucasagomes | dtantsur, yeah... I think that's because the definition of "first" may be a bit weak there... We can have devices connected to different disk controllers | 16:07 |
shardy | dtantsur: we haven't had a formal feature freeze which is why we're in this situation | 16:07 |
lucasagomes | so which one are the "first" ? | 16:07 |
lucasagomes | it's sounds a bit bogus | 16:07 |
shardy | dtantsur: I hope we can move to a much stricter model (like other projects) from Newton | 16:08 |
dtantsur | lucasagomes, well, for people I was talking to, first is /dev/sda... (or rather the same behaviour that we had in Kilo, i.e. sda/vda/hda) | 16:08 |
dtantsur | shardy++ | 16:08 |
dtantsur | lucasagomes, so do you think we should allow --detect-root-device=largest, --detect-root-device=smallest, --detect-root-device=sda,vda,hda? | 16:09 |
lucasagomes | dtantsur, right, but that assumption is wrong IMHO... that's why the list of device order is slightly better, because each operator can input whatever they think the order from first or last are | 16:09 |
lucasagomes | they can define it | 16:09 |
dtantsur | lucasagomes, i.e. allow a couple of strategies + allow to provide a list | 16:09 |
*** mikelk has quit IRC | 16:09 | |
lucasagomes | dtantsur, yeah the smallest and larger makes total sense | 16:09 |
lucasagomes | dtantsur, I think that way is better, cause since we can't logically define what is first what is last (unless all disks are connected to the same disk controller so we can actually look at the physical address) | 16:10 |
dtantsur | lucasagomes, last question: do we need to support full paths, like --detect-root-device=/dev/sda,/dev/hda? | 16:10 |
dtantsur | lucasagomes, I'm afraid of people having some md-based magic with devices not in /dev | 16:11 |
*** mgould has quit IRC | 16:11 | |
lucasagomes | dtantsur, could be yeah... I don't know how the "name" is set in the disks list there | 16:12 |
lucasagomes | but if that's the full device path yeah we can do full path | 16:12 |
dtantsur | lucasagomes, IPA returns /dev/sda etc | 16:12 |
lucasagomes | sounds good then | 16:12 |
dprince | shardy: I've added 2 more items to the list. Which may not be "blockers" but shouldn't get left out | 16:12 |
dtantsur | lucasagomes, hmm, but it does it this way: https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L109 :) | 16:12 |
dprince | shardy: The upgrades CI job, and the IPv4 network iso testing (from last release) | 16:13 |
lucasagomes | dtantsur, hah yeah... well all devices will be under /dev anyway | 16:13 |
lucasagomes | even the ones you look /sys/blocks/by-label etc... are in /dev/ too | 16:13 |
dtantsur | lucasagomes, ok, so no full paths for now then.. we can always enable them later | 16:13 |
lucasagomes | dtantsur, cool | 16:13 |
dprince | shardy: I think we already agreed we'd have the upgrades CI job in place before we did architectural changes anyways. So that blocks newton regardless IMO | 16:13 |
dtantsur | lucasagomes, thanks for review, will post an update soon | 16:14 |
lucasagomes | dtantsur, cool no problem | 16:14 |
*** rdopiera has quit IRC | 16:14 | |
dprince | shardy: if we had CI on those two things I'd be happy to merge the rest of it | 16:14 |
*** Marga_ has quit IRC | 16:17 | |
*** david_lyle__ is now known as david_lyle | 16:17 | |
dprince | shardy: I'd like to take a step back and ask why we are branching again | 16:17 |
dprince | shardy: you want to branch so that it opens up N for architectural work | 16:18 |
dprince | shardy: I'm keen on this too | 16:18 |
shardy | dprince: we are also branching so downstreams (such as RDO) have a release to consume | 16:19 |
slagle | dprince: hey, was in a mtng, let me catch up on the conversation | 16:19 |
shardy | we need to reach the point where we can say "this is mitaka TripleO" | 16:19 |
shardy | just like all other projects | 16:19 |
dprince | but, we've also agree that we won't make architectural changes without working CI on these things. Upgrades/Network ISO | 16:19 |
bnemec | dprince: So regarding CI on upgrades, I think there's two different issues in play. | 16:19 |
bnemec | One is that trown wants to branch at the same time as everyone else so we have a thing to deploy RDO Mitaka right away. | 16:19 |
shardy | dprince: we can agree trunk is still restricted after we branch pending CI changes, sure | 16:19 |
dprince | shardy: so whether we branch or not it really is a matter of do we want to merge code that works | 16:19 |
shardy | but that doesn't mean we don't have to release mitaka IMO | 16:20 |
bnemec | The other is that you want to start working on major architectural changes that will cause backport headaches. | 16:20 |
bnemec | I've been treating the etherpad as addressing the former. | 16:20 |
dprince | shardy: releasing Mitaka without CI on these features (or features from last release) would likely mean releasing a broken Mitaka | 16:20 |
dprince | bnemec: I don't want the backport headache's | 16:21 |
shardy | dprince: we have to release *something*, even if we don't land all those features | 16:21 |
shardy | we can't just say, let's not bother branching | 16:21 |
dprince | shardy: I'm actually fine with that | 16:21 |
shardy | that's not the way the OpenStack release model works | 16:21 |
shardy | (not that TripleO has ever properly respected it) | 16:21 |
trown | lol | 16:21 |
dprince | Either we choose a hard line date, or we choose the set of features we want. Not both | 16:22 |
dprince | This etherpad represents the worst possible compromise in that we are trying to choose both | 16:22 |
trown | I agree to that bit | 16:22 |
shardy | dprince: Yeah, really we have to do better next cycle, and declare a feature freeze, like other projects | 16:22 |
shardy | then we have a better window for stablizing things and ensuring we're happy before releasing | 16:22 |
shardy | dprince: I agree there's too much in the etherpad | 16:23 |
shardy | I was hoping we'd have a small list of stuff, so we could focus review attention | 16:23 |
shardy | instead, it's turned into a feature crunch :( | 16:23 |
dprince | shardy: Yeah. This etherpad is what I've been looking at https://etherpad.openstack.org/p/tripleo-ipv6-support | 16:23 |
*** jaosorior has quit IRC | 16:24 | |
slagle | trown: when are the rdo mitaka repos getting setup? | 16:24 |
slagle | i'd think we would need to branch by then | 16:25 |
slagle | but if the needed features aren't landed, what are we saying? that we allow some feature backport exceptions for mitaka? | 16:25 |
openstackgerrit | Brad P. Crochet proposed openstack/tripleo-common: Upgrades: Add StackUpgradeManager https://review.openstack.org/288568 | 16:26 |
shardy | sounds like we don't have any other option | 16:26 |
slagle | shardy: yea :( | 16:27 |
shardy | then we'll advertise an actual FeatureFreeze for Newton so this doesn't happen again | 16:27 |
trown | slagle: asked apevec in #rdo | 16:27 |
slagle | which means double the backport work, etc | 16:27 |
slagle | but i don't see another way | 16:27 |
slagle | well... | 16:27 |
shardy | slagle: that's still a better outcome than having features backported througout the entire forthcoming cycle | 16:27 |
slagle | the other way would be to trim the scope of what we need to backport | 16:27 |
dprince | if there is backport work from Newton to Mitake I don't want to see us hold up new features to make it easier | 16:28 |
slagle | well, i think it would be just what is not landed from this etherpad in master, when we branch mitaka | 16:29 |
*** xinwu has joined #tripleo | 16:29 | |
bnemec | We may have to have two dates - a branch date, and a "no more backports" date. | 16:29 |
trown | that etherpad is huge though | 16:29 |
bnemec | IPv6 and upgrades are the two really scary ones though. | 16:29 |
ayoung | A ggogle search for ruby puppet rabbit is not too informative except if you want a hand puppet of a rabbit | 16:30 |
*** penick has joined #tripleo | 16:30 | |
*** pblaho has quit IRC | 16:30 | |
openstackgerrit | Dmitry Tantsur proposed openstack/python-tripleoclient: Allow 'openstack baremetal configure boot' to guess the root device https://review.openstack.org/288417 | 16:31 |
dtantsur | lucasagomes, ^^ | 16:31 |
dprince | derekh: should I have rebased my patch to pick up your latest tripleo-ci fix to cleanup the VMs? | 16:32 |
dprince | derekh: I noticed the ceph job also failed due to an Ironic deployment failure (the Heat server resource failed to deploy) | 16:33 |
derekh | dprince: its merged so your patch would be rebased by zuul wouldn't it? checking | 16:33 |
dprince | derekh: perhaps, but that perhaps means the cleanup isn't working yet. | 16:34 |
*** aufi has quit IRC | 16:34 | |
*** mgould has joined #tripleo | 16:34 | |
*** ifarkas has quit IRC | 16:37 | |
derekh | dprince: your ceph ci run had gotten the patch and it doesn't look like an ironic failure to me | 16:38 |
derekh | 2016-03-04 16:27:59.448 | | CephStorageAllNodesValidationDeployment | e98082e6-3453-48c5-a229-af5af3ea6a0b | OS::Heat::StructuredDeployments | CREATE_FAILED | 2016-03-04T16:05:34 | | 16:39 |
dprince | derekh: oh, maybe I got mixed up | 16:39 |
slagle | oh, that sounds like the bug i just filed | 16:39 |
trown | slagle: dprince, from #rdo [11:38:38] <apevec> trown, we'll not release without tripleo | 16:39 |
slagle | derekh: https://bugs.launchpad.net/tripleo/+bug/1553243 | 16:40 |
openstack | Launchpad bug 1553243 in tripleo "CI: Resource CREATE failed: Error: resources.ComputeAllNodesValidationDeployment.resources[0]" [Critical,In progress] - Assigned to James Slagle (james-slagle) | 16:40 |
slagle | same thing looks like | 16:40 |
dprince | slagle: yea, your is different | 16:40 |
slagle | i think i saw this a few times before the testenv redeploy...so not sure it's related to that | 16:40 |
dprince | slagle: sorry, your is the same issue I hit w/ Ceph | 16:40 |
trown | slagle: dprince, however the goal of RDO is always to release within 2 weeks of GA and I would not like to hold that up if possible | 16:40 |
derekh | slagle: dprince yup, thats the issue I'm looking at | 16:41 |
derekh | trown: Do you also know when an initial Mitaka branch will be available, although not necessarily released | 16:43 |
trown | derekh: in what context? we could create rdo-mitaka packaging branches now if we wanted to | 16:43 |
trown | derekh: you mean for the rest of openstack? I would guess as soon as RCs start popping up | 16:44 |
derekh | trown: ok | 16:44 |
*** yamahata has quit IRC | 16:44 | |
dtantsur | derekh, trown, for openstack: oslo libraries are getting mitaka branches pretty soon, then clients. services after they get the 1st RC | 16:45 |
derekh | trown: the context I was wondering about is when every we do create branches we would want to use a mitaka repository to test them against | 16:45 |
*** yamahata has joined #tripleo | 16:45 | |
trown | which not being able to deploy the rest of openstack is a clear blocker for release | 16:45 |
*** penick has quit IRC | 16:46 | |
trown | derekh: I think that delorean trunk is fine for that | 16:46 |
dprince | slagle: I'm still wondering if there is a subtle cleanup bug that is causing 1553243 | 16:47 |
dprince | slagle: testenv-cleanup but | 16:47 |
dprince | bug | 16:47 |
slagle | dprince: oh like another vm is holding onto the IP? | 16:47 |
dprince | slagle: or a bridge or something | 16:47 |
slagle | yea could be | 16:47 |
derekh | trown: maybe, couldn't we get bitten by projects removing deprecated code for example | 16:48 |
openstackgerrit | Carlos Camacho proposed openstack/tripleo-heat-templates: Remove forced rabbitmq::file_limit conversion to string https://review.openstack.org/232983 | 16:48 |
slagle | trown: i'm not sure about testing our mitaka branches against trunk delorean, seems like we'd just have all the issues we currently do with trunk chasing | 16:48 |
derekh | we would need a mitaka repository to make sure the api we talk to stay stable | 16:49 |
trown | slagle: derekh, ah I see, well as soon as we start seeing mitaka RCs we can branch delorean | 16:49 |
trown | because just like liberty it will try to build the configured release (mitaka in this case) and fall back to master if there is not a stable/mitaka branch yet | 16:50 |
derekh | trown: cool, makes sense, | 16:50 |
slagle | sounds reasonable | 16:50 |
slagle | that will at least give us some stability | 16:50 |
jistr | ccamacho: cool, thanks for submitting the rebase! | 16:51 |
ccamacho | :) Thank you for the help! | 16:51 |
*** rcernin has quit IRC | 16:51 | |
*** jcoufal has quit IRC | 16:53 | |
bnemec | gdi, just tripped the breaker in my office again. | 16:57 |
* bnemec should have had them wire it for 50 amps | 16:57 | |
slagle | shut some of those lava lamps off | 16:58 |
trown | jistr, there may have been some other cruft in my environment causing ping test to fail... I just redeployed with your patch `virt-customized` onto my undercloud.qcow2 and the ping test even passed. | 16:58 |
trown | which means we are super close to being back able to deploy trunk | 16:58 |
shardy | \o/ | 16:58 |
jistr | neat :) | 16:59 |
trown | my yes hack for tripleo.sh is not working very well... it causes things not to hang, but instead exit 1 even though the ping test succeeded | 16:59 |
*** devvesa has quit IRC | 16:59 | |
*** dsneddon has quit IRC | 16:59 | |
*** jobewan has joined #tripleo | 17:00 | |
bnemec | slagle: It also turns out that my UPS is utterly worthless. The last two times this has happened it just sits there beeping at me, with the battery at 100% but 0 output voltage. | 17:00 |
derekh | sounds uninterruptible to me | 17:02 |
slagle | did you plug your stuff into the correct side? | 17:02 |
*** dsneddon has joined #tripleo | 17:03 | |
*** Goneri has joined #tripleo | 17:04 | |
*** jobewan has quit IRC | 17:05 | |
*** absubram has joined #tripleo | 17:13 | |
*** absubram_ has joined #tripleo | 17:14 | |
*** absubram has quit IRC | 17:18 | |
*** absubram_ is now known as absubram | 17:18 | |
*** trown is now known as trown|lunch | 17:20 | |
*** fgimenez has quit IRC | 17:20 | |
*** xinwu has quit IRC | 17:22 | |
*** masco has quit IRC | 17:23 | |
openstackgerrit | Karim Boumedhel proposed openstack/puppet-pacemaker: When using Rhevm stonith mechanism, fence packages could allready be referenced somewhere else in the code so such a requirement shouldnt sit in the defined type. Constraints on the stonith resources are also unecessary https://review.openstack.org/288527 | 17:26 |
openstackgerrit | Karim Boumedhel proposed openstack/puppet-pacemaker: puppet-pacemaker rhevm stonith fails https://review.openstack.org/288527 | 17:28 |
*** dtantsur is now known as dtantsur|afk | 17:32 | |
*** yamahata has quit IRC | 17:33 | |
*** mbound has joined #tripleo | 17:36 | |
*** xinwu has joined #tripleo | 17:37 | |
derekh | dprince: you got 2 of them "is not pingable" error and a db connection error, recheck | 17:40 |
derekh | slagle: is that error intermittent ? | 17:40 |
dprince | derekh: yeah, this not pingable error concerns me | 17:40 |
dprince | derekh: any ideas? I just reviewed the testenv's and I don't see anything jumping out at me | 17:41 |
*** gfidente has quit IRC | 17:42 | |
derekh | dprince: no clue, according to logstash its only happened 3 times in the last week, although it may not have caught up yet http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20pingable.%20Local%20Network%5C%22 | 17:44 |
*** jistr has quit IRC | 17:46 | |
slagle | derekh: yes, it's transient | 17:47 |
*** jprovazn has quit IRC | 17:48 | |
*** ohamada has quit IRC | 17:48 | |
slagle | i think i saw it ealier this week before the testenv redeploy | 17:48 |
*** dshulyak has left #tripleo | 17:51 | |
*** trown|lunch is now known as trown | 17:53 | |
trown | slagle: any chance we can get a quick +A on https://review.openstack.org/#/c/288460 it passed CI and dprince already +2 | 17:53 |
trown | I think that is the only trunk blocker | 17:53 |
slagle | there's always a chance man | 17:53 |
trown | :) | 17:53 |
*** lucasagomes is now known as lucas-beer | 17:54 | |
derekh | trown: I'm just after remember something you said earlier about not knowing how trunk nonha passed | 17:57 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Run keystone-manage bootstrap for HA deployment too https://review.openstack.org/288460 | 17:57 |
slagle | in general our CI jobs seem much slower after the testenv redeploy | 17:57 |
slagle | is it just me imagining things? | 17:57 |
trown | derekh: I think it is because ansible is getting a tty | 17:58 |
derekh | trown: looks like the pingtest passed, then failed to delete the tenant stack | 17:58 |
derekh | 2016-03-04 08:27:00.894 | tripleo.sh -- Overcloud pingtest - time out waiting to delete tenant heat stack, please check manually | 17:58 |
trown | derekh: ya, that is different, and fixed by a neutronclient patch today | 17:58 |
derekh | trown: ok | 17:58 |
trown | derekh: RDO CI was hanging waiting for 'y' confirmation to heat stack-delete | 17:59 |
derekh | trown: got ya | 17:59 |
trown | derekh: my super elegant fix :p : https://github.com/redhat-openstack/tripleo-quickstart/commit/447f127f34dbf4069937581c11d2698ac116fd30 | 18:00 |
derekh | slagle: I hadn't noticed but maybe, looks like you could be correct, once a few more jobs finish we'll have a better idea | 18:00 |
bnemec | slagle: The patch I'm watching just got to introspection, and it's been running for two hours. :-/ | 18:01 |
*** mbound has quit IRC | 18:02 | |
slagle | derekh: we did land the patch to enable swap, but a few jobs ran wiht that before the testenv deploy in their normal times | 18:02 |
slagle | iirc | 18:02 |
slagle | the swap enablement itself takes about 3 minutes | 18:02 |
slagle | but if we are now heavily going into swap...that could explain the slow down | 18:02 |
*** tosky has quit IRC | 18:02 | |
*** electrofelix has quit IRC | 18:03 | |
*** yamahata has joined #tripleo | 18:04 | |
*** ccamacho has quit IRC | 18:06 | |
derekh | slagle: ok, it could be any number of things taking up the time, a profile comparison of a fast test vs's a slow one could help to find out | 18:08 |
*** shivrao has joined #tripleo | 18:10 | |
*** derekh has quit IRC | 18:11 | |
bnemec | Ugh, our undercloud log tar went from ~7 MB yesterday to 30 today. | 18:13 |
bnemec | Did RDO start logging to systemd today or something? | 18:13 |
bnemec | The journal files are taking up most of the space. | 18:14 |
trown | all the openstack services have been logging to the journal since I started working on openstack | 18:20 |
*** mbound has joined #tripleo | 18:20 | |
openstackgerrit | Ben Nemec proposed openstack-infra/tripleo-ci: Enable undercloud ssl on nonha job https://review.openstack.org/273743 | 18:20 |
trown | which was not yesterday :P | 18:20 |
*** rbrady has quit IRC | 18:21 | |
*** mbound has quit IRC | 18:24 | |
openstackgerrit | Steven Hardy proposed openstack/tripleo-common: tripleo.sh - build puppet modules from source for stable branches https://review.openstack.org/288626 | 18:25 |
*** shakamunyi has joined #tripleo | 18:29 | |
*** tzumainn has quit IRC | 18:29 | |
*** tzumainn has joined #tripleo | 18:29 | |
openstackgerrit | Merged openstack/instack-undercloud: mysqld config: set innodb_file_per_table to ON https://review.openstack.org/285227 | 18:31 |
*** penick has joined #tripleo | 18:37 | |
*** akrivoka has quit IRC | 18:38 | |
*** cwolferh has quit IRC | 18:38 | |
*** mgould has quit IRC | 18:40 | |
*** pcaruana has quit IRC | 18:41 | |
slagle | looking at the logs from one of the ha jobs that took 2h44m, one controller was 400mb into swap and the other 2 were ~100mb into swap | 18:42 |
slagle | that doesnt seem so bad | 18:42 |
slagle | but maybe it is | 18:42 |
*** xinwu has quit IRC | 18:43 | |
slagle | undercloud 1.2G into swap | 18:43 |
slagle | wait, how much ram should the underclouds have? 5gb or 6gb? | 18:45 |
slagle | i thought we bumped it to 6 | 18:45 |
slagle | or did we only bump it to 5? | 18:45 |
*** xinwu has joined #tripleo | 18:46 | |
slagle | they only have 5, and i think that's a big part of the problem, they are using too much swap | 18:46 |
bnemec | slagle: Only 5, but there are 2 GB of swap now. | 18:46 |
slagle | i'm on one right now | 18:46 |
slagle | and it slow as molasses | 18:46 |
slagle | nova list takes 2 minutes | 18:46 |
bnemec | For some reason, swift-proxy sometimes seems to cache all of the images in memory, so it ends up eating over 1 GB of memory itself. | 18:46 |
bnemec | For a service we barely use. :-/ | 18:46 |
ayoung | If I did a tripleo undercloud install, how can I redo it without blowing away the instack vm? | 18:49 |
*** xinwu has quit IRC | 18:50 | |
*** athomas has quit IRC | 18:51 | |
slagle | in theory, you can just rerun the installer | 18:53 |
*** ccamacho has joined #tripleo | 18:53 | |
slagle | but there is a bug right now preventing that | 18:53 |
trown | there are also some things that are not redoable | 18:55 |
trown | since not everything is puppet | 18:55 |
slagle | trown: you mean they fail when you rerun them? | 18:56 |
bnemec | I have some commands that attempt to wipe an installed undercloud without rebuilding it, but they only work about 50% of the time in my experience. | 18:56 |
trown | slagle: no, they just don't change | 18:56 |
slagle | trown: oh i see | 18:56 |
bnemec | It's what I do when I get to the point of needing to rebuild, but don't want to completely start from scratch. | 18:56 |
trown | I am failing to think of an example, but I have hit it before | 18:56 |
slagle | yea, the coveted openstack uninstaller | 18:57 |
slagle | practically the most requested rfe | 18:57 |
trown | if only there were a way to quickly deploy from a pre-built qcow2 | 18:57 |
slagle | if only :) | 18:57 |
bnemec | Yeah, cause that wouldn't run afoul of idempotency issues... ;-) | 18:58 |
trown | bnemec: if the undercloud has not been installed yet, and is just all the packages it wont :) | 18:59 |
trown | bnemec: tripleo-quickstart on my dell mini goes from nothing to ready to deploy in about 10 min | 18:59 |
*** isq has joined #tripleo | 19:09 | |
trown | dib elements are a pretty awful way to install packages :), so many yum transactions | 19:09 |
dprince | I'm +2 on the initial IPv6 patch. Tested it locally.... | 19:10 |
dprince | https://review.openstack.org/#/c/235423/ | 19:10 |
*** mburned is now known as mburned_out | 19:10 | |
*** xinwu has joined #tripleo | 19:10 | |
*** xinwu has quit IRC | 19:13 | |
*** ccamacho has quit IRC | 19:16 | |
EmilienM | we miss a last +A now :) | 19:21 |
EmilienM | dprince: we need to land https://review.openstack.org/#/c/278979/ first | 19:22 |
*** ccamacho has joined #tripleo | 19:23 | |
bnemec | trown: From what I hear, puppet isn't any better. And actually, if we would have ever bothered to convert all of the package installs in dib to the new methods, I think it would be better. | 19:25 |
dprince | EmilienM: done | 19:25 |
bnemec | The problem is we have a bunch of elements still doing independent installs. I believe the non-deprecated method actually rolls all of the installs into one so you don't have to run yum once per element. | 19:26 |
EmilienM | dprince: thx | 19:26 |
trown | bnemec: ah that would be better | 19:28 |
trown | bnemec: for building the undercloud.qcow2 for RDO, I just have a big list of every package and pre-install them | 19:28 |
openstackgerrit | Merged openstack/tripleo-heat-templates: Allow for usage of pre-allocated IPs for the management network https://review.openstack.org/278979 | 19:29 |
bnemec | trown: Yeah, at some point we should audit the elements we use and make sure they all get converted to https://github.com/openstack/diskimage-builder/tree/master/elements/package-installs | 19:30 |
*** ccamacho has quit IRC | 19:33 | |
*** ccamacho has joined #tripleo | 19:34 | |
*** jcoufal has joined #tripleo | 19:36 | |
*** panda has quit IRC | 19:38 | |
*** panda has joined #tripleo | 19:38 | |
*** ccamacho has quit IRC | 19:40 | |
EmilienM | bnemec: can you look https://review.openstack.org/#/c/235423/ when you have time? | 19:40 |
*** tzumainn has quit IRC | 19:41 | |
trown | bnemec: ya, there are also a bunch of elements that have a single line of bash, which seems an inefficient way to construct an image too | 19:46 |
trown | some of them are just hiding packaging bugs: https://github.com/redhat-openstack/tripleo-quickstart/blob/master/playbooks/roles/images/build/templates/dib-workaround-default.sh.j2 | 19:48 |
*** mburned_out is now known as mburned | 19:50 | |
dprince | slagle: could this issue be related to the swap file? | 19:51 |
trown | 0 to "ping test validated HA overcloud" in 35 min. http://fpaste.org/334085/45712108/ | 19:51 |
slagle | dprince: which? :) | 19:52 |
dprince | slagle: sorry, this one https://bugs.launchpad.net/tripleo/+bug/1553243 | 19:52 |
openstack | Launchpad bug 1553243 in tripleo "CI: Resource CREATE failed: Error: resources.ComputeAllNodesValidationDeployment.resources[0]" [Critical,In progress] - Assigned to James Slagle (james-slagle) | 19:52 |
*** shivrao has quit IRC | 19:52 | |
dprince | slagle: we just turned that on yesterday right? | 19:52 |
slagle | dprince: i think we turned it on wednesday | 19:53 |
slagle | related? i don't see how, but anything is possible if things are just running really slow | 19:53 |
slagle | i don't think we would have gone into swap during the validationdeployment | 19:54 |
slagle | as none of openstack is started yet | 19:54 |
slagle | things are just crawling in CI, it could be the swap file | 19:55 |
dprince | slagle: yeah, super slow | 19:55 |
slagle | but even now, the job i'm looking at, it's only at initially deploying the oc, and the undercloud has a load of 20 | 19:55 |
slagle | the swap change did merge yesterday actually, https://review.openstack.org/#/c/286793/ | 19:57 |
dprince | trown: back in the day I shot for 25 minutes w/ smokestack | 19:57 |
dprince | trown: and I had it under 20 at times | 19:57 |
trown | dprince: nonha can do that :) | 19:57 |
dprince | trown: yeah, I wasn't using HA | 19:57 |
trown | ha on a 32G host is just tight resource-wise | 19:57 |
slagle | but if you look at the CI on that patch, the times are normal | 19:57 |
*** weshay has quit IRC | 19:57 | |
dprince | slagle: could be because it was just a single run using it | 19:58 |
dprince | slagle: and now in parallel it is getting messy | 19:58 |
dprince | slagle: it is hard to say, we've had so much chnged this week. Like I rebuilt the testenv's yesterday | 19:58 |
dprince | slagle: new code... plus this swap patch | 19:59 |
slagle | we could try backing it out, but we'd have to revert the aodh patch as well | 19:59 |
slagle | otherwise, nothing will pass | 19:59 |
dprince | slagle: yeah. I'm gonna say aodh isn't the priority right now. | 20:00 |
dprince | slagle: I get everyone wants their patch in, but we are in a bad place at the moment | 20:00 |
EmilienM | I don't know why we need ceilometer & aodh on the undercloud | 20:01 |
EmilienM | I don't see any use case | 20:01 |
slagle | EmilienM: this was the overcloud patch | 20:01 |
slagle | it drove up memory just enough so that stuff was getting oom killed | 20:02 |
EmilienM | we need to reduce workers to 2 for all we can | 20:02 |
EmilienM | we had this problem in puppet CI (at lower scale ok) but we managed with reducing workers to 2 | 20:03 |
EmilienM | not sure how much workers are set by default for openstack services | 20:03 |
trown | EmilienM: I think we already reduce them all to 1 | 20:03 |
EmilienM | on both uc/oc? for all services? | 20:03 |
trown | just oc I think, looking for where that heat environment is | 20:04 |
openstackgerrit | James Slagle proposed openstack/tripleo-heat-templates: WIP: Revert "Deploy Aodh services, replacing Ceilometer Alarm" https://review.openstack.org/288714 | 20:05 |
dprince | EmilienM: we may be able to reduce some workers... but stevebaker specifically wanted Heat engine at 4 workers for us in the undercloud. That is a Gig | 20:05 |
openstackgerrit | James Slagle proposed openstack-infra/tripleo-ci: WIP: Revert "Use swapfile environment in CI" https://review.openstack.org/288716 | 20:06 |
slagle | i guess we can see what happens with that ^^ | 20:06 |
dprince | slagle: I can swapoff the testenv's myself | 20:06 |
dprince | slagle: that could be part of the issue | 20:06 |
*** mburned is now known as mburned_out | 20:06 | |
trown | EmilienM: found it https://github.com/openstack-infra/tripleo-ci/blob/master/test-environments/worker-config.yaml | 20:06 |
EmilienM | the swap is only useful when you have SSD imho | 20:06 |
*** derekh has joined #tripleo | 20:06 | |
dprince | slagle: derekh just enable swap yesterday I think | 20:07 |
EmilienM | if you enable swap on slow disk, things can be worse than before | 20:07 |
dprince | EmilienM: agree | 20:07 |
EmilienM | trown: thanks! I was trying to find it, you are faster :-P | 20:07 |
* dprince is ready to run swappoff on all the nodes | 20:07 | |
derekh | slagle: was thinking about what you mentioned while at dinner, something occured to me, popping on to mention it | 20:07 |
derekh | dprince: stop | 20:07 |
trown | EmilienM: it moved, had to look up the review that moved it | 20:07 |
dprince | derekh: stopped | 20:08 |
derekh | dprince: slagle 2016-03-04 16:41:13.369 | + ./testenv-client -b 192.168.1.1:4730 -t 10200 -- ./toci_instack.sh | 20:08 |
slagle | i hope i wasn't drunk at this dinner | 20:08 |
derekh | 2016-03-04 17:44:04.691 | 2016-03-04 17:44:04,663 - testenv-client - INFO - Received job : { | 20:08 |
derekh | the jobs are wait up to an hour to get testenvs | 20:08 |
derekh | dprince: so we got a shortage to testenvs | 20:08 |
*** rbrady has joined #tripleo | 20:08 | |
dprince | derekh: I just created 1 more | 20:08 |
derekh | dprince: I assume you were about to swapoff the testenvs? | 20:08 |
EmilienM | trown: 1 worker is a bit low imho - 2 would be ideal | 20:09 |
EmilienM | trown: 1 worker might expose you to some api timeouts | 20:09 |
dprince | derekh: IPMI isn't working on 3-4 of them (we need to get them fixed) | 20:09 |
EmilienM | do we have api timeouts? | 20:09 |
dprince | derekh: I was going to try to swapoff | 20:09 |
derekh | dprince: they arn't running slowley,we just havn't enough | 20:09 |
trown | EmilienM: I have not hit that, I use the same environment in RDO CI | 20:10 |
dprince | derekh: the reason why is I just tried it again on testenv18 (my testbed) and it failed there with this ComputeAllNodesValidation error | 20:10 |
dprince | derekh: I didn't see that yesterday | 20:10 |
dprince | derekh: or last night rther | 20:10 |
*** Marga_ has joined #tripleo | 20:10 | |
trown | dprince: its not the python-ipaddr package missing is it? | 20:11 |
trown | dprince: I hit that after we set -e the validation script | 20:11 |
bnemec | trown: So the issue I'm having with deleting heat stacks seems to only happen with network isolation enabled. Were you using it when you had problems? | 20:11 |
derekh | dprince: so maybe we're talking about two different things, I'm talking about that fact that slagle mentions jobs were running slower since the rebuild | 20:11 |
dprince | trown: I had another patch to fix that via overcloud-base in tripleo-puppet-elements. It landed | 20:11 |
derekh | dprince: and now are timing out completly (all jobs) | 20:11 |
trown | bnemec: nope, my issues were sans net iso | 20:11 |
dprince | derekh: perhaps 2 seperate issues | 20:12 |
derekh | dprince: the reason for that is that all of them are waiting for over an hour to get a testenv | 20:12 |
pradk | slagle, is the aodh patch causing the gate failure? it was passing yesterday though? | 20:12 |
dprince | derekh: the slowness isn't helping debug the other one | 20:12 |
dprince | derekh: I just brought up the swap change as it was also new this week | 20:12 |
*** weshay has joined #tripleo | 20:12 | |
bnemec | trown: Okay, so my problem is most likely a different bug. I'll open another one to separate my issue from yours. | 20:13 |
slagle | pradk: it's just a theory that enabling swap slowed things down | 20:13 |
derekh | dprince: we only have 31 testenvs, we need about 50 | 20:13 |
derekh | [heat-admin@testenv18-testenv0-avobrytykidu ~]$ nc 192.168.1.1 4730 | grep lock | 20:13 |
derekh | status | 20:13 |
derekh | lockenv 31 31 48 | 20:13 |
trown | bnemec: my issue was fixed with the python-neutronclient patch that landed btw | 20:13 |
dprince | derekh: I can fire up some more | 20:14 |
bnemec | trown: Yeah, I'm not running against trunk though, and I think my error is internal to Heat. | 20:14 |
trown | ah k, probably different issue then | 20:14 |
dprince | derekh: but I think I'd still like to consider dropping the swap for now | 20:14 |
derekh | dprince: that should fix all these timeouts we're getting http://tripleo.org/cistatus.html | 20:14 |
pradk | slagle, ah ok, so since this depends on swap patch, we're reverting this too? | 20:14 |
derekh | dprince: why? whats wronge with the swap? | 20:14 |
derekh | dprince: we've had it for months | 20:15 |
slagle | derekh: different patch | 20:15 |
slagle | we enabled swap on the oc nodes | 20:15 |
dprince | derekh: yeah, for aodh | 20:15 |
derekh | dprince: slagle sorry, I'm still talking about testenvs, do what ye want with the other swap ;-) | 20:16 |
slagle | let's see where the more testenv's gets us | 20:16 |
dprince | derekh: I'll fire up a few more now | 20:16 |
jpeeler | thrash: did you have any plans on pushing this review through - https://review.openstack.org/#/c/235569/ | 20:16 |
dprince | derekh: like I said old nova baremetal isn't playing nicely with all the IPMI's so its taking some time | 20:17 |
derekh | dprince: cool, jobs should at least stop timing out then and you can see real errors | 20:17 |
thrash | jpeeler: I'm trying. :) | 20:17 |
derekh | dprince: yup, its a pain in the ass | 20:17 |
derekh | dprince: slagle ok I gotta run, just jumped in to mention what I noticed, ttyl | 20:17 |
slagle | thx | 20:18 |
*** derekh has quit IRC | 20:18 | |
openstackgerrit | Brad P. Crochet proposed openstack/tripleo-common: Build image files from definitions in yaml https://review.openstack.org/235569 | 20:19 |
*** jcoufal has quit IRC | 20:19 | |
jpeeler | thrash: ok. if you're able to get your patch in and then i get mine in, it'll save me a lot of documentation work! thanks for staying on it | 20:20 |
thrash | jpeeler: fixed the requirement (thought I had already TBH) | 20:20 |
*** rbrady_ has joined #tripleo | 20:20 | |
*** rbrady has quit IRC | 20:21 | |
*** xinwu has joined #tripleo | 20:24 | |
jayg | anyone around who could give a second +2 to a small puppet-tripleo backport? https://review.openstack.org/#/c/287974/ slagle approved the version for master yesterday | 20:29 |
bnemec | trown: Ah, with trunk Heat I can delete my overcloud again. So whatever bug I'm hitting in tripleo-current seems to be fixed already. | 20:29 |
trown | nice | 20:30 |
slagle | bnemec: i was just about to ask if you solved that yet | 20:30 |
slagle | so just update to current? | 20:30 |
bnemec | slagle: I added ",openstack-heat-common,openstack-heat-api,python-heatclient,openstack-heat-api-cfn,openstack-heat-engine,openstack-heat-templates" to includepkgs in delorean-current.repo | 20:31 |
bnemec | The yum update openstack-heat-engine. | 20:31 |
*** thrash is now known as thrash|brb | 20:31 | |
bnemec | *Then | 20:32 |
slagle | thanks, will try | 20:32 |
jayg | thanks slagle! | 20:33 |
*** bvandenh has quit IRC | 20:33 | |
*** jtomasek has quit IRC | 20:33 | |
openstackgerrit | Merged openstack/puppet-tripleo: loadbalancer: fix Redis timeout HAproxy config https://review.openstack.org/287974 | 20:36 |
*** weshay has quit IRC | 20:39 | |
*** rbrady_ has quit IRC | 20:42 | |
*** ccamacho has joined #tripleo | 20:42 | |
stevebaker | dprince, EmilienM: what we really need is undercloud heat-engine workers to be tuned to the expected size of the overcloud. Also raising the number of workers happened before we discovered the rpc timeout regressions so some of those failures on small (1 core) underclouds may not have been deadlocks | 20:44 |
dprince | stevebaker: think we might should try to go back down to 2 for a default? | 20:45 |
stevebaker | dprince: the default is unset, which creates a worker per core | 20:46 |
dprince | stevebaker: oh, right. so maybe we do just need to pin lower than that if we want/need to | 20:47 |
dprince | stevebaker: for CI | 20:47 |
stevebaker | dprince: I mean now the default is max(4, cores) | 20:49 |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Upgrades: Add 'stack upgrade' command https://review.openstack.org/286606 | 20:50 |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Upgrades: Add --post option to 'stack upgrade' https://review.openstack.org/288728 | 20:50 |
stevebaker | dprince: It used to be (cores), maybe upstream could change to max(2, cores) | 20:50 |
dprince | stevebaker: exactly | 20:51 |
stevebaker | dprince: or instack-undercloud should just infer an appropriate value to set heat.conf num_engine_workers to | 20:51 |
stevebaker | with its knowledge of memory/cores | 20:52 |
stevebaker | and its certainty that an undercloud is a single host heat | 20:52 |
*** weshay has joined #tripleo | 20:53 | |
stevebaker | dprince: its probably not appropriate to set the upstream heat default to what undercloud needs - its not exactly a typical production heat setup | 20:53 |
dprince | stevebaker: I agree there | 20:54 |
stevebaker | (single stack, single host) | 20:54 |
openstackgerrit | Ryan Hallisey proposed openstack/tripleo-heat-templates: Parameterize the heat-docker-agents image https://review.openstack.org/288731 | 20:54 |
stevebaker | dprince: I would suggest CI set num_engine_workers to 2 via ansible, and keep an eye on rpc timeouts | 20:56 |
trown | hehe at "via ansible" | 20:56 |
trown | via bash I think | 20:56 |
stevebaker | trown: oh, I assumed there was ansible all over | 20:57 |
trown | RDO and downstream yes... tripleoci not so much | 20:57 |
stevebaker | also is there memory pressure on the small overcloud controllers? | 20:57 |
stevebaker | crap, puppet-heat doesn't have an option for num_engine_workers | 20:58 |
*** ccamacho has quit IRC | 20:58 | |
trown | stevebaker: ya have to use heat::config::heat_config | 20:59 |
stevebaker | trown: ok, that could always be set 1 overcloud heat-engine worker per controller (or whatever the tempest tests need to run) | 21:00 |
*** cwolferh has joined #tripleo | 21:01 | |
stevebaker | trown, dprince: does a ci undercloud really need 13 nova-api and 6 nova-conductor? | 21:02 |
dprince | stevebaker: no, where do you see those? | 21:02 |
stevebaker | dprince: my random undercloud | 21:02 |
stevebaker | dprince: the last puddle on a 8G 4 core vm | 21:03 |
trown | we could probably tune those down by default too | 21:05 |
*** thrash|brb is now known as thrash | 21:13 | |
openstackgerrit | Brad P. Crochet proposed openstack/tripleo-common: Upgrades: Add post-upgrade stack update https://review.openstack.org/288744 | 21:15 |
*** penick has quit IRC | 21:16 | |
shardy | We do already tune these workers somewhat for the overcloud: https://review.openstack.org/#/c/273431/8/toci_instack.sh | 21:17 |
shardy | e.g in CI | 21:17 |
shardy | anyway, night all, have a good weekend :) | 21:17 |
*** dprince has quit IRC | 21:18 | |
bnemec | You too, shardy | 21:18 |
*** shardy has quit IRC | 21:18 | |
openstackgerrit | Brad P. Crochet proposed openstack/python-tripleoclient: Upgrades: Add --post option to 'stack upgrade' https://review.openstack.org/288728 | 21:20 |
openstackgerrit | James Slagle proposed openstack/tripleo-heat-templates: Make AllNodesExtraConfig depend on the validation deployments https://review.openstack.org/288747 | 21:22 |
*** dmsimard has quit IRC | 21:23 | |
*** shivrao has joined #tripleo | 21:25 | |
*** penick has joined #tripleo | 21:38 | |
*** jayg is now known as jayg|g0n3 | 21:40 | |
*** rlandy has quit IRC | 21:43 | |
*** rlandy has joined #tripleo | 21:44 | |
*** r-mibu has quit IRC | 21:47 | |
*** r-mibu has joined #tripleo | 21:47 | |
*** mburned_out is now known as mburned | 21:50 | |
*** gfidente has joined #tripleo | 21:55 | |
openstackgerrit | Richard Su proposed openstack/tripleo-heat-templates: Store events in Ceilometer https://review.openstack.org/287561 | 21:58 |
*** pcaruana has joined #tripleo | 21:58 | |
gfidente | yeh! I'm going to get ipv6 working, no matter what | 21:59 |
*** lblanchard has quit IRC | 21:59 | |
*** cwolferh has quit IRC | 22:02 | |
*** cwolferh has joined #tripleo | 22:04 | |
*** rlandy has quit IRC | 22:04 | |
*** dustins has quit IRC | 22:10 | |
*** weshay has quit IRC | 22:13 | |
Erming_ | trown: are you there. a question about the overcloud: | 22:24 |
Erming_ | trown: why after the deployment, almost all the services are inactive (disabled on boot) except for the swift ones? | 22:25 |
trown | Erming_: sorry, I have to run, but if it is an HA or pacemaker setup, it is because pacemaker is managing the services not systemd | 22:26 |
*** trown is now known as trown|outtypewww | 22:26 | |
Erming_ | trown: Thanks | 22:26 |
Erming_ | trown|outtypewww: Have a great weekend (hope you could see it though :-) | 22:27 |
*** rhallisey has quit IRC | 22:42 | |
*** penick has quit IRC | 22:48 | |
*** dsneddon has quit IRC | 22:54 | |
*** xinwu has quit IRC | 22:56 | |
*** tiswanso has quit IRC | 22:58 | |
*** xinwu has joined #tripleo | 22:59 | |
*** xinwu has quit IRC | 22:59 | |
*** Goneri has quit IRC | 23:00 | |
*** xinwu has joined #tripleo | 23:02 | |
*** xinwu has quit IRC | 23:02 | |
*** morazi has quit IRC | 23:04 | |
*** rwsu has quit IRC | 23:08 | |
*** rwsu has joined #tripleo | 23:09 | |
*** mbound has joined #tripleo | 23:13 | |
*** Goneri has joined #tripleo | 23:16 | |
*** trozet has quit IRC | 23:32 | |
*** jobewan has joined #tripleo | 23:35 | |
*** jdob has quit IRC | 23:36 | |
gfidente | EmilienM, you around? | 23:50 |
*** mbound has quit IRC | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!