Friday, 2016-01-29

*** sthillma has quit IRC00:01
*** sthillma has joined #tripleo00:02
*** rwsu has quit IRC00:09
*** alop has joined #tripleo00:12
*** rwsu has joined #tripleo00:16
*** panda_ has quit IRC00:18
*** panda_ has joined #tripleo00:19
*** cwolferh has quit IRC00:26
*** cwolferh has joined #tripleo00:27
openstackgerritMatthew Thode proposed openstack/diskimage-builder: package-installs doesn't work in python3/gentoo  https://review.openstack.org/27378200:29
openstackgerritMatthew Thode proposed openstack/diskimage-builder: add gentoo support to growroot  https://review.openstack.org/27376900:32
*** panda_ has quit IRC00:34
*** panda_ has joined #tripleo00:34
*** dmacpher has joined #tripleo00:34
*** Marga_ has quit IRC00:43
*** panda_ has quit IRC00:45
*** panda_ has joined #tripleo00:45
*** ayoung has joined #tripleo00:49
*** chlong has joined #tripleo00:49
openstackgerritMatthew Thode proposed openstack/diskimage-builder: add support for openrc to dib-init-system  https://review.openstack.org/27377200:51
openstackgerritMatthew Thode proposed openstack/diskimage-builder: add support for openrc to dib-init-system  https://review.openstack.org/27377200:52
openstackgerritMatthew Thode proposed openstack/diskimage-builder: fix gentoo hardened support  https://review.openstack.org/27379500:52
*** panda_ has quit IRC00:56
*** panda_ has joined #tripleo00:56
*** marcusvrn_ has quit IRC00:57
*** ccrouch has quit IRC01:02
*** ccrouch has joined #tripleo01:04
*** panda_ has quit IRC01:04
*** panda_ has joined #tripleo01:05
*** lazy_prince has joined #tripleo01:09
*** yamahata has joined #tripleo01:10
openstackgerritMatthew Thode proposed openstack/diskimage-builder: add support for gentoo to simple-init  https://review.openstack.org/27379001:12
openstackgerritJames Slagle proposed openstack/tripleo-heat-templates: Add swap to overcloud nodes  https://review.openstack.org/27375201:24
*** cwolferh has quit IRC01:24
*** cwolferh has joined #tripleo01:25
*** penick has quit IRC01:28
*** rlandy has quit IRC01:29
*** mgagne has quit IRC01:32
*** mgagne has joined #tripleo01:32
*** mgagne has joined #tripleo01:32
*** pradk has quit IRC01:35
*** pradk has joined #tripleo01:51
prometheanfireare triple-o checks failing for dib?01:53
*** tiswanso has joined #tripleo01:59
*** tiswanso has quit IRC01:59
*** tiswanso has joined #tripleo02:00
*** yamahata has quit IRC02:08
*** sthillma has quit IRC02:11
*** cwolferh has quit IRC02:12
*** shivrao has quit IRC02:16
*** xinwu has quit IRC02:16
openstackgerritDan Sneddon proposed openstack/tripleo-heat-templates: Add TripleO Heat Template Parameters for Neutron Tenant MTU  https://review.openstack.org/27384702:23
*** coolsvap|away is now known as coolsvap02:36
*** tzumainn has quit IRC02:42
*** mgagne has quit IRC02:51
*** mgagne has joined #tripleo02:51
*** Marga_ has joined #tripleo02:59
*** Marga_ has quit IRC03:01
*** Marga_ has joined #tripleo03:02
*** pradk has quit IRC03:16
*** panda_ has quit IRC03:23
*** panda_ has joined #tripleo03:23
*** pradk has joined #tripleo03:28
*** panda_ has quit IRC03:29
*** panda_ has joined #tripleo03:30
*** alop has quit IRC03:30
*** coolsvap is now known as coolsvap|away03:30
*** lazy_prince has quit IRC03:34
*** yuanying_ has joined #tripleo03:38
*** yuanying has quit IRC03:40
*** panda_ has quit IRC03:45
*** panda_ has joined #tripleo03:45
*** zaneb has quit IRC03:46
*** coolsvap|away is now known as coolsvap03:49
*** yuanying has joined #tripleo04:06
*** yuanying_ has quit IRC04:06
*** yuanying has quit IRC04:11
*** panda_ has quit IRC04:14
*** panda_ has joined #tripleo04:14
openstackgerritMatthew Thode proposed openstack/diskimage-builder: Add support for OpenRC to dib-init-system  https://review.openstack.org/27377204:17
*** yuanying has joined #tripleo04:17
*** panda_ has quit IRC04:19
*** panda_ has joined #tripleo04:20
*** panda_ has quit IRC04:28
*** panda_ has joined #tripleo04:29
openstackgerritMerged openstack/diskimage-builder: Fix debian-minimal image building  https://review.openstack.org/27354404:32
*** panda_ has quit IRC04:34
*** panda_ has joined #tripleo04:34
openstackgerritMerged openstack/diskimage-builder: Don't use wc -l for the umount check  https://review.openstack.org/27338604:35
*** tiswanso has quit IRC04:36
*** masco has joined #tripleo04:37
*** panda_ has quit IRC04:41
*** panda_ has joined #tripleo04:41
*** lazy_prince has joined #tripleo04:47
openstackgerritIan Wienand proposed openstack/diskimage-builder: Only match #!/bin/bash in scripts  https://review.openstack.org/27387904:53
*** panda_ has quit IRC04:54
*** panda_ has joined #tripleo04:55
*** killer_prince has joined #tripleo05:09
*** anande has joined #tripleo05:09
*** anande has quit IRC05:10
openstackgerritMatthew Thode proposed openstack/diskimage-builder: fix gentoo hardened support  https://review.openstack.org/27379505:10
*** chlong has quit IRC05:12
*** lazy_prince has quit IRC05:12
*** coolsvap is now known as coolsvap|away05:15
*** panda_ has quit IRC05:19
*** panda_ has joined #tripleo05:20
*** jaosorior has joined #tripleo05:20
*** panda_ has quit IRC05:20
*** panda_ has joined #tripleo05:21
*** lazy_prince has joined #tripleo05:25
*** killer_prince has quit IRC05:28
*** chlong has joined #tripleo05:32
*** Marga_ has quit IRC05:58
*** shivrao has joined #tripleo06:03
*** chlong has quit IRC06:28
*** rcernin has quit IRC06:29
jaosoriormarios: Are you around?06:29
*** coolsvap|away is now known as coolsvap06:31
*** xinwu has joined #tripleo06:32
*** yamahata has joined #tripleo06:33
*** chlong has joined #tripleo06:39
*** jaosorior has quit IRC06:40
*** dmacpher has quit IRC06:41
*** Marga_ has joined #tripleo06:43
*** jaosorior has joined #tripleo06:47
*** jprovazn has joined #tripleo06:52
*** jprovazn has quit IRC06:52
*** jprovazn has joined #tripleo06:52
mariosjaosorior: o/ morning06:53
jaosoriormarios: Hey dude, how's it going?06:56
jaosoriormarios: Hey, I'm checking out the pingtest, and noticed some functions such as06:57
jaosorior"tripleo wait_for" and "tripleo user-config"06:57
jaosoriormarios: Such as here https://github.com/openstack/tripleo-common/blob/master/scripts/tripleo.sh#L552 and here https://github.com/openstack/tripleo-common/blob/master/scripts/tripleo.sh#L53806:58
jaosoriormarios: Where do these come from?06:58
mariosjaosorior: they are some of the original tripleo-incubator scripts07:00
mariossec07:00
*** panda_ has quit IRC07:01
*** shivrao has quit IRC07:01
*** panda_ has joined #tripleo07:01
mariosjaosorior: https://github.com/openstack/tripleo-incubator/tree/master/scripts  on an undercloud env, you'll find them in /usr/libexec/openstack-tripleo/07:01
*** aufi has joined #tripleo07:02
jaosoriormarios: Thanks dude07:06
mariosnp man07:07
*** devvesa has joined #tripleo07:12
*** rcernin has joined #tripleo07:25
*** dshulyak has joined #tripleo07:26
jaosoriormarios:  hey dude, have you been looking into the CI failures regarding the pingtest?07:26
mariosjaosorior: there was a failure yesterday but it was  about overcloud heat dying like 503 service unavailable07:27
mariosjaosorior: i haven't looked today yet07:27
mariosjaosorior: err, yes during the pingtest07:27
jaosoriormarios: Oh, I see. I was trying to check out the logs, but damn they don't really have much :/07:28
mariosjaosorior: spoke briefly with derekh and apparently there was issue with insufficient ram on the overcloud nodes07:28
mariosjaosorior: sec07:28
mariosjaosorior: this is what i mean http://logs.openstack.org/13/260413/9/check-tripleo/gate-tripleo-ci-f22-ha/d5fff90/console.html#_2016-01-27_18_55_34_64607:28
jaosoriormarios: I see. Started checking it out today, and only thing I saw was something about the compute not being able to assign a vcpu to a domain07:29
mariosjaosorior: (give it a sec,should load to exact time)07:29
jaosoriormarios: The error I saw in another commit was that it was actually able to create the tenant, but the ping itself failed07:30
jaosoriormarios: http://logs.openstack.org/51/273751/1/check-tripleo/gate-tripleo-ci-f22-ceph/91011a9/console.html#_2016-01-28_22_32_12_69907:30
*** chlong has quit IRC07:32
jaosoriormarios: This is another commit and it shows the same error: http://logs.openstack.org/52/273752/3/check-tripleo/gate-tripleo-ci-f22-ceph/b3ea7ec/console.html#_2016-01-29_03_18_51_58207:33
jaosoriorSo yeah, apparently we've been looking at different things07:33
mariosjaosorior: reading sec (sry doing sthing else gimme few)07:36
*** hjensas has joined #tripleo07:36
mariosjaosorior: yeah that one is different, there the vm actually couldn't be pinged07:37
openstackgerritgreghaynes proposed openstack/diskimage-builder: WIP: Move hook generation in to python  https://review.openstack.org/27113907:39
jaosoriormarios:  Yeah, those are the ones I've been looking at07:39
jaosoriormarios: Maybe the lack of memory might be somewhat addressed by this CR https://review.openstack.org/#/c/273752/07:39
mariosjaosorior: so here's the thing. the pingtest initially used the fedora-user image, instead of the cirros image. Mainly because I had intermittent (and seemingly random) issues with pinging the cirros vms (we were also then building fedora-user by default so it made sense at the time)07:45
mariosjaosorior: i don't know if this is related in this case, but if we continue to see this we may need to revisit07:47
jaosoriormarios: Yeah, maybe it would make sense to look for another small footprint alternative to cirros07:47
jaosoriormarios: Then again, I'm not sure if the logs in nova that say that it couldn't allocate a vcpu to the domain is related. I am yet to reproduce it in my environment.07:48
*** mbound has joined #tripleo07:51
openstackgerritgreghaynes proposed openstack/diskimage-builder: Move hook generation in to python  https://review.openstack.org/27113907:54
openstackgerritgreghaynes proposed openstack/diskimage-builder: Move hook generation in to python  https://review.openstack.org/27113907:55
*** fgimenez has joined #tripleo08:04
*** fgimenez has quit IRC08:04
*** fgimenez has joined #tripleo08:04
*** mbound has quit IRC08:08
*** dshulyak has left #tripleo08:19
*** dshulyak has joined #tripleo08:21
*** athomas has joined #tripleo08:23
*** Marga_ has quit IRC08:24
*** pblaho has joined #tripleo08:26
*** rasca has quit IRC08:28
*** rasca has joined #tripleo08:29
*** panda_ has quit IRC08:30
jaosoriormarios: Well, bnemec added a patch to debug the ping errors. Here it is if you wanna check it out: https://review.openstack.org/273701 I'm following it to see if I can get any info out of it08:30
*** panda_ has joined #tripleo08:30
mariosjaosorior: thanks (heh and of course the pingtest passed on the ha job this time :/ http://logs.openstack.org/01/273701/1/check-tripleo/gate-tripleo-ci-f22-ha/3edb145/console.html#_2016-01-28_23_19_12_108 )08:32
jaosoriormarios: Hahaha yeah dude, it's a very sneaky thing to debug08:34
*** masco has quit IRC08:35
*** jcoufal has joined #tripleo08:37
*** bvandenh has joined #tripleo08:46
*** mbound has joined #tripleo08:50
*** shardy has joined #tripleo08:55
*** derekh has joined #tripleo08:56
*** bvandenh has quit IRC08:56
*** shardy has quit IRC09:01
*** shardy has joined #tripleo09:02
*** panda_ has quit IRC09:04
*** panda_ has joined #tripleo09:05
*** jcoufal has quit IRC09:07
*** jcoufal has joined #tripleo09:08
*** bvandenh has joined #tripleo09:09
*** gfidente has joined #tripleo09:13
derekhshardy: that change to set HeatWorkers to 1, only changes the API workers, not the engine09:14
*** coolsvap is now known as coolsvap|away09:15
derekhshardy: it doesn't look like the puppet-heat module exposes num_engine_workers09:15
shardyderekh: hrm09:15
shardydoh09:16
shardyI guess we'll have to fix that then - although don't the puppet modules allow for setting arbitrary config values?09:17
* shardy tries via extraconfig09:17
*** nico_auv has joined #tripleo09:18
shardyhttps://review.openstack.org/#/c/269071/09:19
shardyYeah I think we can do it via controllerExtraConfig instead09:19
shardytesting now09:19
derekhshardy: ack, thanks, if this doesn't work, I'll bump up the overcloud ram over the weekend,09:19
* derekh will also look into setting some other workers to 109:20
shardykk, sorry I threw that patch up yesterday assuming it'd work, should've tested it locally really09:20
shardywe could completely disable the cfn and cloudwatch APIs on the overcloud as well, seeing as we're not testing them09:21
*** jaosorior has quit IRC09:22
*** jaosorior has joined #tripleo09:22
*** paramite has joined #tripleo09:27
*** jistr has joined #tripleo09:30
*** panda_ has quit IRC09:34
*** panda_ has joined #tripleo09:35
*** links has joined #tripleo09:38
openstackgerritMerged openstack/tripleo-heat-templates: Bump the pacemaker service op_params to 200s for start and stop  https://review.openstack.org/27202609:39
*** paramite has quit IRC09:40
*** paramite has joined #tripleo09:42
*** hjensas has quit IRC09:52
*** jcoufal_ has joined #tripleo09:53
*** paramite has quit IRC09:53
*** jcoufal has quit IRC09:54
*** paramite has joined #tripleo09:55
*** regebro has quit IRC09:55
shardyderekh: Hey, failing to get the right hiera key structure here:09:56
shardyhttps://github.com/openstack/puppet-heat/blob/master/manifests/config.pp09:56
shardyany idea what key will be needed to feed in a value for DEFAULT/num_engine_workers?09:57
shardywe include the manifest from https://review.openstack.org/#/c/269071/3/puppet/manifests/overcloud_controller.pp09:57
*** jcoufal has joined #tripleo09:57
derekhshardy: off the top of my head I don't know, will play with it in a few minutes and see if I can find out09:58
shardyderekh: thanks, I've tried a few things but evidently doing it wrong09:59
shardythe confusing thing is the values take a hash containing a value key09:59
derekhshardy: ack09:59
*** jcoufal_ has quit IRC10:00
jaosoriorshardy: I thjink you can actually set arbitrary config values. I did that for keystone10:07
*** jaosorior has quit IRC10:08
*** jaosorior has joined #tripleo10:08
jaosoriorshardy: This is the way you would set it https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/hieradata/controller.yaml#L4810:08
shardyjaosorior: aha!10:09
shardythanks10:09
shardyI was trying to do heat::config::DEFAULT/num_engine_workers10:10
jaosoriorshardy: You could use heat_config { 'some name': 'DEFAULT/num_engine_workers' => {value => 'something something'}}10:11
jaosoriorbut that's with the provider object10:11
jaosoriorbetter add it to hieradata like the example I put up there. Also, I've been told it's a better way to do it (less clutter in the puppet manifests)10:11
shardyjaosorior: ack, yeah I need to do it via hiera as I'm using the controllerExtraConfig parameter10:12
*** aufi_ has joined #tripleo10:12
jaosoriorshardy: Makes sense10:13
*** aufi has quit IRC10:14
jaosoriormarios: the patch adding more output to pingtest in case there are errors finally failed like we wanted it to. Here you go if you wanna check it out too http://logs.openstack.org/01/273701/1/check-tripleo/gate-tripleo-ci-f22-nonha/7da6bd4/console.html#_2016-01-29_09_36_50_34110:15
mariosthanks jaosorior10:17
*** panda_ has quit IRC10:18
*** jcoufal has quit IRC10:18
*** panda_ has joined #tripleo10:19
jaosoriormarios: Seems that the neutron openvswitch agent crashes at some point. Failing with "Cannot allocate memory"10:22
jaosoriormarios: And if you see some of the output from neutron, it doesn't really crash but it fails to get some resources, giving outputs such as router not found, and the same for the port when trying to get it10:22
mariosjaosorior: interesting... are you getting this from the controller logs at http://logs.openstack.org/01/273701/1/check-tripleo/gate-tripleo-ci-f22-nonha/7da6bd4/logs/10:24
*** athomas has quit IRC10:24
jaosoriormarios: Yes10:24
mariosjaosorior: ok gonna have a closer look in a sec thx10:24
*** tremble has joined #tripleo10:25
*** jcoufal has joined #tripleo10:25
mariosjaosorior: fwiw, i remember at the time, that rebooting the vm helped (made it "pingable" again) but i never tracked down the actual cause10:25
jaosoriormarios: The interesting bits are in /var/log/neutron/openvswitch-agent.log and /var/log/neutron/server.log10:26
*** mcornea has joined #tripleo10:26
mariosjaosorior: wow 51 - - - - -] OSError: [Errno 12] Cannot allocate memory wtf10:27
jaosoriormarios: fun fun fun :D10:29
jaosoriorshardy: The issue that you're trying to fix with the num_engine_workers is also related to the running-out-of-memory errors, right?10:32
shardyyup10:32
jaosoriorshardy: Will that help with the memory problems in the overcloud nodes? or is that just for the undercloud?10:33
shardyhttp://paste.openstack.org/show/485383/10:34
shardythat brings the overcloud memory usage down somewhat10:35
jaosoriorshardy: That looks about right10:35
shardythere's more we can do I think10:36
openstackgerritSteven Hardy proposed openstack-infra/tripleo-ci: Override HeatWorkers parameter for deployed overcloud  https://review.openstack.org/27343110:36
shardywe're still running multiple nova conductors for example10:36
* shardy looks for more things to turn off10:36
jaosorioris that patch directed to tripleo-ci because it's meant for testing? Maybe it should be considered to be added to t-h-t10:37
openstackgerritSteven Hardy proposed openstack-infra/tripleo-ci: Minimise memory usage for deployed overcloud  https://review.openstack.org/27343110:38
shardyjaosorior: It's only directed at CI because we're trying to use not-enough RAM10:38
shardythe defaults make much less sense for real deployments10:38
shardywe could document the environment for developers tho10:38
jaosoriorthat makes sense10:38
shardythe t-h-t defaults make more sense for production/real usage as they are IMO10:38
derekhshardy: don't have a running overcloud at the moment to test on, but does this work for you ? http://paste.openstack.org/show/485385/10:48
shardyderekh: thanks - jaosorior actually pointed me at a keystone example and I've got it working now:10:49
shardyhttps://github.com/openstack/tripleo-heat-templates/blob/master/puppet/hieradata/controller.yaml#L4810:49
shardyhttps://review.openstack.org/#/c/273431/5/toci_instack.sh10:49
derekhshardy: ok, cool10:49
shardyit's a bit of a weird syntax10:50
derekhyup10:50
shardySeveral other services also don't scale down not-api processes via *Workers10:50
shardyso I'll push another update shortly reducing nova/neutron backend workers10:50
shardyIn total it reduces my memory usage by nearly a gig!10:51
jaosoriorshardy: http://weknowmemes.com/wp-content/uploads/2012/11/mexcellent.jpg10:52
mariosshardy: \o/10:53
shardyhttp://giphy.com/gifs/vMnuZGHJfFSTe10:53
*** athomas has joined #tripleo10:54
*** tosky has joined #tripleo10:56
*** olap has quit IRC11:01
openstackgerritMerged openstack/python-tripleoclient: We do not need to pass a NeutronControlPlaneID  https://review.openstack.org/26532011:03
jistrshardy: hi, i'm trying to do kilo->liberty upgrades, and apart from other errors i've been able to work around for the time being (e.g. worked around https://bugs.launchpad.net/heat/+bug/1538551 by reverting the NtpServer patch linked there, and hit similar issues with other params too), i'm seeing this happen http://fpaste.org/315822/53997844/11:05
openstackLaunchpad bug 1538551 in heat "Unable to update a parameter from string to comma_delimited_list" [Undecided,New]11:05
*** jcoufal has quit IRC11:05
jistrshardy: do you think Heat is trying to re-deploy all servers from scratch, because we changed properties of the OS::Nova::Server resources?11:06
*** sbalukoff has quit IRC11:06
shardyjistr: probably - what properties have changed?11:06
shardythe one which springs to mind is the user_data11:07
jistrshardy: e.g. when i look at controller in stable/liberty, there's a bunch of them -- https://github.com/openstack/tripleo-heat-templates/blame/stable/liberty/puppet/controller.yaml#L63811:07
shardythat will, unfortunately, replace the resource atm11:07
jistrshardy: yeah, userdata, software_config_transport, metadata11:07
jistrshardy: and for nova computes, the default hostname scheme has changed11:08
jistr(as visible from the fpaste)11:08
shardygah, it'll be the user_data which is doing it I think11:08
gfidentejistr, shardy I think the properties remained the same11:08
jistrshardy: the rest of the changes wouldn't result in redeploy? (name, metadata, software_config_transport)11:09
gfidenteexcept for metadata?11:09
gfidentenot user_data11:09
jistrgfidente: no, user_data changed too11:09
jistrwe had it before11:09
jistrbut it's now composed from 2 variables11:09
shardyjistr: I was thinking of adding a heat property which allows us to ignore any changes to user_data11:09
shardythat would help, but we probably can't backport it11:09
shardyhttp://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server-props-opt11:09
shardythat shows which properties can be update without replacement11:10
shardyvs "Updates cause replacement"11:10
gfidentejistr, right went from type: OS::TripleO::NodeUserData to type: OS::Heat::MultipartMime ?11:10
jistrshardy: what about "locking down" a resource completely instead? wouldn't it be a better suited option for us?11:10
jistri don't think we ever want Heat to replace a server just like that11:10
shardyjistr: That is another option, but then the update would just fail11:10
shardyramishra has a patch up which does exactly that11:11
shardybut in this case, I think you'd want any new nodes to get the new user_data, and for existing nodes to ignore the change11:11
jistrshardy: well i was thinking something in terms of "update the heat model but don't respin the server", instead of UPDATE_FAILED11:11
jistrshardy: yea11:11
shardyjistr: but then the heat model is corrupted, in that it doesn't reflect reality anymore11:11
shardyI guess it's possible tho11:11
shardyhttps://review.openstack.org/#/c/253074/19/doc/source/template_guide/environment.rst11:12
shardythat's the new feature ramishra_ has been working on - I need to test it & hopefully we can land it soon11:12
shardyIn this case tho, we really want a way to say all OS::Nova::Server resources anywhere in the stack are restricted_actions: replace11:13
shardywe can probably specify it for each ResourceGroup tho via wildcards11:13
gfidentefeels like what we did with the NetworkDeployment actions11:14
shardyYeah, anywhere we use SoftwareDeployment this is much easier11:14
jistrshardy: yea that sounds like what we need. So when the action is restricted this way, will the update fail or will it just ignore the action?11:14
shardyjistr: it will fail11:15
jistrok so it's not enough to solve our problem by itself11:15
shardyno, but it's a potential step towards a solution11:15
jistryes11:15
shardyfor now, we have to figure out how to not change the causes-replacement properties11:15
shardyIf it's only the user_data, I can potentially post a patch which adds a config option to allow ignoring user_data changes on update11:16
*** athomas has quit IRC11:16
gfidenteshardy, jistr so from what I can see11:16
gfidentewe used to define user_data as os::tripleo::nodeuserdata11:17
gfidenteand now we make it instead a multipart which in one of its parts is os::tripleo::nodeuserdata11:17
shardyYeah, now we pass it a multi-part mime archive11:17
shardyit was to wire in the heat-admin user11:17
gfidentebut I don't think it's worth or useful make heat do any introspection there to see if it actually changed or not11:17
shardyOk, give me a few mins, let me see if I can do a heat patch quickly which makes the on-update behavior configurable11:18
shardywe probably can't backport a new property, but we potentially can backport a new config option, defaulted to the current behavior11:19
jistrgfidente: +111:19
gfidentejistr, hey why I got the +1?11:19
gfidentethe not have introspection?11:20
jistrre "not useful to make heat do any introspection"11:20
*** jcoufal has joined #tripleo11:20
gfidentejistr, man you're diving alone11:20
*** akrivoka has joined #tripleo11:20
*** athomas has joined #tripleo11:20
gfidenteI was just curious about this because is similar to what happened with the VIPs :P11:20
jistrfor OS::Nova::Server specifically, what i would see as useful is simply ignore any updates...11:21
jistrmaybe something like "ignore_actions" rather than "restrict_actions"11:21
gfidenteyeah the NetworkDeployment thing11:21
shardyjistr: Let's wire in doing that for user_data now, and consider the more general case after11:21
shardyhttp://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server-prop-flavor_update_policy11:22
gfidentebut as shardy said that is only for softwaredeployment11:22
shardywe could just have update_policy properties for each causes-replacement property, such as flavor/image11:22
shardyand add an option "IGNORE"11:22
*** sbalukoff has joined #tripleo11:22
gfidentethat would ignore all properties11:22
gfidenteor rather a list of properties to be ignored?11:23
jistrshardy: yes, "update_policy: ignore" sounds to me like a good way forward11:23
jistrgfidente: both would probably work for us... unless there's an update we can perform on OS::Nova::Server without re-spinning11:23
gfidenteright because thise would be for ::server only?11:24
jistrgfidente: yeah i think so11:25
jistrperhaps if one was able to specify a list of properties to ignore, it would be a more generally useful construct in heat templates11:25
jistrbut to solve our problem with replacing servers, ignoring all would work i think11:25
*** shardy has quit IRC11:27
*** shardy has joined #tripleo11:28
shardyhttps://bugs.launchpad.net/heat/+bug/153954111:29
openstackLaunchpad bug 1539541 in heat "Can't ignore updates to OS::Nova::Server" [Undecided,New] - Assigned to Steven Hardy (shardy)11:29
shardyjistr, gfidente ^^11:29
jistrshardy: not sure where your connection dropped, here's a recent chat history http://fpaste.org/316160/40669351/11:29
jistrshardy: thanks11:29
shardyjistr: thanks11:29
*** xinwu has quit IRC11:30
*** bvandenh has quit IRC11:30
shardyjistr, gfidente: I'm planning to solve just for the user_data now, then we can address the more general problem in subsequent patches11:31
shardycan you confirm it's the user_data change which is forcing replacement in your case?11:31
shardythe more general solution will be good, but it's unlikely we can backport such a change to stable branches11:31
jistrshardy: ok will try it again with reverted patch which changes the user data. Had to rebuild my env today so i don't have everything ready yet.11:33
*** fgimenez has quit IRC11:45
*** bvandenh has joined #tripleo11:45
*** fgimenez has joined #tripleo11:45
*** jcoufal has quit IRC11:46
*** athomas has quit IRC11:46
*** derekh has quit IRC11:47
*** athomas has joined #tripleo12:03
*** jcoufal has joined #tripleo12:03
*** bvandenh has quit IRC12:06
jaosoriorshardy: Seems to me the commit you have for reducing the used memory failed also, the nonha case has just failed due to the pingtest, and I think it's the same issue I was discussing with marios. That openvswitch fails because of lack of memory :/12:10
slagleare we sure it's related to memory at this point?12:10
slaglei uploaded a patch to add 4gb of swap, and it failed as well12:10
slagleon the pingtest12:11
jaosoriorslagle: the explicit error was that it couldn't allocate memory12:11
shardyMy local controller is only using 2.7G of ram with that patch12:11
jaosoriorslagle: Something like this OSError: [Errno 12] Cannot allocate memory wtf12:11
jaosoriorexcept for the wtf12:11
jaosoriorthat was from me :P12:11
slaglejaosorior: that's in the openvswitch log on the controller?12:12
jaosoriorslagle: yeah12:12
shardy(2.7G not doing anything admittedly)12:12
slagleok, i don't see that in the controller logs on the failed job i'm looking at12:13
jaosoriorslagle: Funky, I see that in your patch that wasn't the error. So then there's another problem there12:14
jaosoriorslagle: The patches I was looking at with marios had the "OSError: [Errno 12] Cannot allocate memory" issue :/12:15
*** bvandenh has joined #tripleo12:17
*** rcernin has quit IRC12:18
*** rcernin has joined #tripleo12:18
*** weshay_xchat has joined #tripleo12:38
*** weshay_xchat is now known as weshay12:42
*** masco has joined #tripleo12:45
*** ukalifon has joined #tripleo12:46
*** julim has joined #tripleo12:47
*** lblanchard has joined #tripleo12:50
*** ukalifon has quit IRC12:50
*** ukalifon1 has joined #tripleo12:50
*** ukalifon1 has quit IRC13:01
*** dprince has joined #tripleo13:13
*** ukalifon1 has joined #tripleo13:17
*** trown|outttypeww is now known as trown13:18
*** jayg|g0n3 is now known as jayg13:22
*** athomas has quit IRC13:24
*** electrofelix has joined #tripleo13:25
*** fgimenez has quit IRC13:28
*** ukalifon1 has quit IRC13:28
*** fgimenez has joined #tripleo13:32
*** fgimenez has joined #tripleo13:32
*** jtomasek_ has joined #tripleo13:34
*** jtomasek has quit IRC13:34
*** jcoufal has quit IRC13:40
*** jcoufal has joined #tripleo13:47
ayoungfor jenkinsnumber in range(1, 8):        shouldn't this be a random number, or we always hammer the same node: https://github.com/openstack-infra/tripleo-ci/blob/master/scripts/tripleo-jobs.py#L4913:51
*** rpothier has joined #tripleo13:51
*** pradk has quit IRC13:51
jaosoriorshardy: Hey dude, have you figured out why the nonha job is failing for the CR you put up? The one for minimising the memory usage14:00
*** lazy_prince has quit IRC14:02
slaglei've been running the pingtest in a loop locally, and so far i've gotten it to fail twice14:04
slaglethe first time, i forgot it cleans up on failure though, so i disabled that and got it to fail again14:05
slaglegoing to look into it in a bit (got pulled into a meeting)14:05
*** egafford has joined #tripleo14:06
*** akuznetsov has joined #tripleo14:06
jaosoriorslagle: any hints on where I should be looking? Seems that the places where I was seeing the memory problems are no longer crashing. But there is something else going on that I haven't figured out.14:06
*** akuznetsov has quit IRC14:06
*** akuznetsov has joined #tripleo14:07
shardyjaosorior: sorry, I'm not looking into that atm, I'm working on heat patches to fix the update issue discovered by jistr & gfidente14:07
*** akuznetsov has quit IRC14:07
slaglejaosorior: none yet. the instance is up, the floating ip is associated according to nova14:07
slaglejaosorior: so i'll need to dig into some internals as to why it's not pingable14:07
*** akuznetsov has joined #tripleo14:07
*** jcoufal has quit IRC14:08
*** tzumainn has joined #tripleo14:09
slaglethe ip is set in the router ns on the controller, but i can't ping it from there either14:09
*** akuznetsov has quit IRC14:10
*** tiswanso has joined #tripleo14:11
*** tiswanso has quit IRC14:11
*** rlandy has joined #tripleo14:12
*** tiswanso has joined #tripleo14:12
jaosoriorwell, the logs I've seen haven't been too thankful14:14
jaosoriorslagle: By the way, bnemec put up a patch for debugging the issue. Might be useful to have it permanently as it only outputs more info if it fails https://review.openstack.org/#/c/273701/14:15
*** thrash|pto is now known as thrash14:16
*** panda_ has quit IRC14:18
*** jcoufal has joined #tripleo14:18
*** panda_ has joined #tripleo14:19
jaosoriorayoung: that function you posted seems to fetch data for the jenkins jobs. what's up with that?14:19
ayoungjaosorior, I'm, learning jenkins.14:20
ayoungjaosorior, and I was wondering why we always select the lowest number jenkins server14:20
jaosoriorayoung: For fun or some specific reason? :O14:20
ayoungseems like we would hammer that one14:20
ayoungjaosorior, Keystone HTTPD failed and ran postci14:20
ayoungtrying to figure out why14:20
*** jtomasek_ has quit IRC14:21
ayoungjaosorior, log is here: http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/console.html  scroll to 2016-01-28 18:50:09.65714:21
ayounghttp://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/console.html#_2016-01-28_18_50_09_68714:21
ayoungmissed a bit, but, close enough14:21
jaosoriorayoung: One thing to note is that the CI is broken, running out of memory :/14:22
ayoungjaosorior, maybe because we always run ion the same jenkins machine?14:22
jaosoriorayoung: Aaaand also some weird pingtest issue that we haven't figured out14:23
jaosoriorayoung: This is where it fails: http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/console.html#_2016-01-28_18_48_34_04614:23
*** ccrouch has quit IRC14:23
ayoungjaosorior, it might be openstackclient.14:23
jaosoriorayoung: No idea about that dude, maybe derekh knows something14:23
*** jcoufal has quit IRC14:23
ayoungjaosorior, I just was looking at my tripleo deploy and running openstack server list gives an error14:23
*** Goneri has joined #tripleo14:23
ayoungit looks like the openstack client errors on any empty response14:23
ayounglet me reproduce and paste14:24
*** akrivoka has quit IRC14:24
jaosoriorayoung: Well, that specific failure is actually a regular "ping" command, and the nova server is not responding :/14:24
ayounghttp://paste.openstack.org/show/485421/14:25
ayoungjaosorior, the log shows a bunch of errors due to emtables not be available, too14:25
ayoungthat is in worlddump.sh, so not a smoking gun, but it certainly looks ugly14:26
ayoungjaosorior, anyway, I want to see what is calling postci...14:26
*** mbound has quit IRC14:27
*** jcoufal has joined #tripleo14:27
jaosoriorfunky14:28
ayoungtripleo.sh -- Overcloud pingtest, trying to ping the floating IPs 192.0.2.5114:30
ayoungOK...so whatever is monitoring that...14:30
ayoungjaosorior, let me see if I can run that on my tripleo run14:32
ayoungjaosorior, running it now...14:33
ayoung+--------------------------------------+--------------+---------------+---------------------+--------------+14:36
ayoung| id                                   | stack_name   | stack_status  | creation_time       | updated_time |14:36
ayoung+--------------------------------------+--------------+---------------+---------------------+--------------+14:36
ayoung| c91e393a-9e69-4e60-85e4-cedf2a8043ab | tenant-stack | CREATE_FAILED | 2016-01-29T14:33:11 | None         |14:36
ayoung+--------------------------------------+--------------+---------------+---------------------+--------------+14:36
*** akrivoka has joined #tripleo14:36
ayoungjaosorior, does that leave a log?14:37
ayoungactually...let me see if I can tell how to debug...14:37
*** ccrouch has joined #tripleo14:39
ayoungshardy, if I have a Heat stack that failed, and no output from why it failed, where do I look for the log14:39
ayounghttp://paste.openstack.org/show/485423/14:40
jaosoriorayoung: You could do heat resource-list -n 5 tenant-stack14:41
jaosoriorand it will tell you which resource failed14:41
ayoungjaosorior, OK....14:41
jaosoriorthen from there, you could start debugging further14:41
*** dmacpher has joined #tripleo14:42
ayoungjaosorior, Stack not found: tenant-stack14:42
ayoungand I can run via ID14:42
ayoungheat resource-list -n 5 c91e393a-9e69-4e60-85e4-cedf2a8043ab14:42
ayoungbut they are all deleted14:42
ayounglike server1               |                      | OS::Nova::Server             | DELETE_COMPLETE | 2016-01-29T14:33:11 | tenant-stack14:43
ayoungjaosorior, going to look in nova server logs14:43
jaosoriorah yeah, now I remember that slagle had that issue. The pingtest will ultimately delete the tenant-stack after running.. you need to disable that to debug it... :/14:43
ayoungHTTP exception thrown: Flavor m1.demo14:44
ayoung could not be found.14:44
ayounghmmm14:44
ayoung bc007211-2eb0-4da6-b1d4-9bdd0f7c7943 | m1.demo   |   512 |   10 |         0 |     1 | True14:44
ayoungso something has made that,14:44
ayoung...14:44
shardyhttps://github.com/openstack/tripleo-common/blob/master/scripts/tripleo.sh#L50414:44
jaosoriorshardy: Any idea where the m1.demo flavor is created?14:45
*** ccrouch1 has joined #tripleo14:46
ayoungjaosorior, I'm running the ping test agaiun, and tail -f the nova log...14:46
shardyHmm, no actually I don't14:46
shardyhttps://github.com/openstack/tripleo-common/blob/master/templates/tenantvm_floatingip.yaml#L2214:47
shardythat's where it's referenced14:47
*** ccrouch has quit IRC14:48
ayoungTHere seems to be this line repeated14:49
ayoung2016-01-29 14:48:16.731 13030 INFO nova.api.openstack.wsgi [req-8280787b-eea7-407c-a360-70f0fab7cf1c 4012f6d4a6b6456c825e430dbcca9c53 ff789e71a7ad47b1a36b9fba084479f8 - - -] HTTP exception thrown: Flavor m1.demo could not be found.14:49
jaosoriorayoung: Seems to me that m1.demo is assumed to exist in the overcloud14:49
jaosoriornot sure where it gets created though14:49
ayoungjaosorior, it seems to be there, but the ID is a uuid, not a single digit integer14:50
shardyLooks like tripleoclient does it in overcloud_deploy.py14:50
openstackgerritDerek Higgins proposed openstack-infra/tripleo-ci: Retry the ping test on failure  https://review.openstack.org/27409414:51
ayoungIf one Derek Higgins was logged in to IRC....14:51
ayoungwe'd ask him14:51
openstackgerritMerged openstack/python-tripleoclient: Set NeutronMetadataProxySharedSecret  https://review.openstack.org/26813114:51
ayoungshardy, so the log of the run shows this http://paste.openstack.org/show/485424/14:52
ayounghow does Heat decide that the stack has failed?14:53
shardyhttps://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_deploy.py#L49314:53
shardyayoung: I think the error was before that but the CI didn't catch the failure correctly an exit14:53
shardywe presumably failed to create the flavor there (no, I don't know why we're doing that in the client btw)14:54
ayoungshardy, I'm running this on a Tripleo I set up myselt, not CI14:54
ayoungI have that flavor defined in my overcloud like this:14:54
shardyayoung: source overcloudrc then do nova flavor-list14:54
shardyIs there a m1.demo?14:54
ayounghttp://paste.openstack.org/show/485425/14:55
*** akuznetsov has joined #tripleo14:55
ayoungthat is overcloud14:55
shardyodd, the heat stack makes that exact same call then decides it can't see the m1.demo image14:58
jaosoriorO_O14:58
*** jcoufal has quit IRC14:59
*** akuznetsov has quit IRC14:59
*** jcoufal has joined #tripleo15:00
*** morazi has joined #tripleo15:02
ayoungshardy, how do you know that?15:02
shardyayoung: The pingtest creates a server via a heat template, which I linked earlier15:03
ayoungshardy, looking at the heat template now15:03
shardyall heat does is call novaclient15:03
shardyis it actually that nova itself can't find the flavor when we call to create the server?15:04
ayoungthat is the creat line at 493..where is the fail?15:04
shardyayoung: your pasted nova-api log shows it can't find m1.demo, but we just listed it, hence I am confused15:07
shardyall heat does is a novaclient call equivalent to "nova boot"15:07
*** Goneri has quit IRC15:08
ayoungshardy, let me try to do that manually.15:09
*** Goneri has joined #tripleo15:10
*** ukalifon1 has joined #tripleo15:10
*** jcoufal_ has joined #tripleo15:16
ayoungshardy, maybe the issue is actually that there are no images15:16
ayoungopenstack image list15:16
ayounglist index out of range15:16
ayoungbm-deploy-ramdisk  ?15:17
*** mbound has joined #tripleo15:18
*** jcoufal has quit IRC15:18
*** pradk has joined #tripleo15:18
*** bvandenh has quit IRC15:18
*** ccrouch1 has quit IRC15:20
*** mbound_ has joined #tripleo15:21
*** ccrouch has joined #tripleo15:23
*** mbound has quit IRC15:23
*** anande has joined #tripleo15:24
*** masco has quit IRC15:24
openstackgerritJames Slagle proposed openstack/tripleo-common: Output some debug info when pingtest fails  https://review.openstack.org/27370115:26
*** pradk has joined #tripleo15:26
ayoungjaosorior, does that make sense?  That the image is missing?15:28
slaglebnemec: check out my addition to ^^15:29
bnemecslagle: Ah, good call.15:31
slaglei'm going to check if these cirros images are booting with no_timer_check15:31
slaglei dont see it in /proc/cmdline from within the instance itself15:32
slagleshouldnt I?15:33
*** paramite is now known as paramite|afk15:33
*** akrivoka has quit IRC15:35
*** dprince has quit IRC15:35
jaosoriorayoung: that might as well be the issue. But it's kind of puzzling that it spits the wrong error then.15:36
*** fgimenez has quit IRC15:37
*** jprovazn has quit IRC15:38
*** fgimenez has joined #tripleo15:39
jaosoriorayoung: Is that system you're testing on accessible from somewhere?15:41
ayoungjaosorior, yes.  I'll PM you15:43
*** jdob has quit IRC15:43
*** jcoufal_ has quit IRC15:45
*** anande has quit IRC15:46
*** thrash has quit IRC15:47
*** xinwu has joined #tripleo15:47
*** mbound_ has quit IRC15:47
*** paramite|afk is now known as paramite15:50
*** jdob has joined #tripleo15:51
*** xinwu has quit IRC15:55
*** yamahata has quit IRC15:55
*** yamahata has joined #tripleo15:56
*** akuznetsov has joined #tripleo15:56
*** mbound has joined #tripleo15:56
*** mcornea has quit IRC15:58
*** absubram has joined #tripleo15:58
*** jhenner1 has joined #tripleo15:58
*** jhenner has quit IRC16:00
*** akuznetsov has quit IRC16:00
*** dshulyak has quit IRC16:01
ayoungshardy, jaosorior, we don't grab the controller logs in gerrit. Makes it hard to debug.  I16:01
ayoungthink that the glance issue is telling16:01
*** thrash has joined #tripleo16:02
*** thrash has joined #tripleo16:02
jaosoriorayoung: You can see the controller logs in gerrit16:03
ayoungjaosorior, in logs...looking16:04
ayoungjaosorior, where?16:06
ayounghttp://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/  only has the one subdir, under that16:06
jaosoriorayoung: http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/logs/16:06
ayounghttp://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/logs/  does not have the glance log.16:06
jaosoriorayoung: Specifically this file http://logs.openstack.org/75/213175/22/check-tripleo/gate-tripleo-ci-f22-nonha/279f3c9/logs/overcloud-controller-0.tar.xz16:06
ayoungthanks16:06
jaosoriorayoung: Only reason heat spits out for the failure in that pingtest is ClientException: resources.server1: Unknown Error (HTTP 502)16:07
*** bvandenh has joined #tripleo16:07
*** jhenner1 has quit IRC16:08
ayoungjaosorior, the fact that the glance log is full of errors and the fact that there are no images in glance is coincidence of causal?16:09
ayoungah...16:11
ayoung| fe176d46-682e-4303-abe5-c2316a85c828 | pingtest_image | active |16:11
ayoungso I assume pingtest cleans that up at the end16:11
jaosoriorayoung: I removed the cleanup of pingtest in the last run16:11
ayoungok..so not glance.16:12
jaosoriorbut currently there's seems to be some funky thing. getting a 502 from nova16:12
ayoungwhat was the call that triggered that?16:12
ayoungis there a 502 in the nova api log?16:12
*** absubram has quit IRC16:13
jaosoriorlook16:13
ayoungjaosorior, I don't see one16:13
ayoungyou sure it is from nova?16:14
jaosoriornova-api.log shows this when I try to do server list16:14
jaosorior2016-01-29 16:13:36.788 13015 INFO oslo_service.service [-] Child 19808 killed by signal 916:14
jaosorior2016-01-29 16:13:37.008 19831 INFO nova.osapi_compute.wsgi.server [req-9a6f67fb-0f54-4cd6-878d-d1433deee590 - - - - -] (19831) wsgi starting up on http://192.0.2.22:8774/16:14
ayoungrun it again, please16:15
ayoungI bet it is an out of memoruy....let me look in the journal16:15
jaosoriornow it's hanging16:15
ayoungJan 29 16:15:05 overcloud-controller-0 cinder-scheduler[12156]: 2016-01-29 16:15:05.868 12156 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.0.2.22:5672 is unreachable: [Errno 111] ECONNREFUSED16:16
ayoungps -ef | grep rabbit16:16
ayoung  shows nothing16:16
jaosorioryeah16:16
jaosoriorI had the feeling rabbit had died16:16
jaosoriorHad some issues with that before16:17
ayoungJan 29 14:43:39 overcloud-controller-0 cinder-volume[12118]: 2016-01-29 14:43:39.161 12252 ERROR oslo_service.periodic_task OSError: [Errno 12] Cannot allocate memory16:17
ayoungHehe16:17
ayoungthe rabbit done died16:17
jaosoriorperkele16:17
*** derekh has joined #tripleo16:18
jaosoriorayoung: Might be worth re-deploying with this addition https://review.openstack.org/#/c/273752/16:18
ayoung"Billy Jean is not my lover..."16:18
ayoungoooh, not if we can help it16:19
jaosoriorthat's slagle's change adding a swap partition to the overcloud nodes16:19
*** mbound has quit IRC16:23
*** aufi_ has quit IRC16:24
*** jhenner has joined #tripleo16:24
*** panda_ has quit IRC16:25
*** panda_ has joined #tripleo16:26
*** rcernin has quit IRC16:27
*** paramite is now known as paramite|afk16:28
*** ccrouch1 has joined #tripleo16:30
*** jaosorior has quit IRC16:31
*** ccrouch has quit IRC16:32
EmilienMgfidente: please look https://review.openstack.org/#/c/27126016:35
slaglebnemec: ok, fwiw, the fedora cloud image has no_timer_check baked into the kernel cmdline, so i thnk we need to switch to use that16:36
slagleand it also took 134s to boot, plus the time to associate the floating ip16:36
*** ukalifon1 has quit IRC16:36
slagleso need to up the timeout as well16:36
bnemecYeah, that's uncomfortably close to 180, especially in heavily loaded CI systems.16:37
openstackgerritBen Nemec proposed openstack-infra/tripleo-ci: Make pingtest partially voting  https://review.openstack.org/27415116:40
bnemecslagle: marios: ^16:40
openstackgerritBen Nemec proposed openstack/tripleo-heat-templates: Add swap to overcloud nodes  https://review.openstack.org/27375216:42
derekhI've reproduced the ping test failures16:46
derekhthe console log for the instance on the overcloud shows this http://paste.openstack.org/show/485454/16:46
derekhand it stops there16:46
derekhany ideas ?16:46
derekhfric it, that what yer talking about16:47
derekhslagle: bnemec ^ is that what ye are seeing ?16:47
bnemecFFS.  How has cirros not fixed that?16:48
bnemecAnd how does this not kill the other OpenStack CI jobs?16:48
bnemecBecause yeah, that's exactly the no_timer_check problem.16:49
*** absubram has joined #tripleo16:49
bnemecI just restored my fedora image change and am cleaning it up so it can actually merge.16:49
bnemecslagle: derekh: ^16:49
derekhbnemec: I've virsh destroy/started an image that failed and it booted the second time fone, ping works16:49
derekhbnemec: ack, lets do that16:50
*** bvandenh has quit IRC16:50
derekhI've checked the logs from shards "use less mem" test and it has gotten rid of the OOm problem (or push it down the road a bit16:51
slaglederekh: yep, that's the no_timer_check thing i pointed out earlier16:51
derekhhttps://review.openstack.org/#/c/274151/16:52
openstackgerritBen Nemec proposed openstack/tripleo-common: Use Fedora image for ping test  https://review.openstack.org/27369916:52
*** bvandenh has joined #tripleo16:52
bnemecI doubled the ping timeout to 360.  Let me know if you think I should go higher.16:53
derekhbnemec: will do,16:53
*** paramite|afk is now known as paramite16:54
d0ugalIs this the correct way to do releases?16:56
d0ugalhttps://wiki.openstack.org/wiki/TripleO/ReleaseManagement16:56
slaglei'll pull that fedora user image patch and run it in a loop16:56
bnemecd0ugal: Yes.16:56
bnemecd0ugal: Although if you haven't released before you may need to get the bit flipped to allow it.16:57
trowntesting the fedora ping test as well, I think that is a much better solution than dropping the ping test from the HA job16:57
bnemecOr one of us could do it.16:57
d0ugalbnemec: ah, I want to release python-tripleoclient liberty16:57
bnemectrown: I don't think that's going to solve the problem by itself.  We'll see though.16:57
d0ugalbnemec: If you could help that would be great. I've not released before.16:58
d0ugalHow do I get the bit flipped?16:58
trownmy concern is that the HA job and the nonHA job are not testing the same templates16:59
trownso it is totally possible to break the overcloud for just HA if we turn off ping test16:59
bnemecd0ugal: slagle did it for me.16:59
bnemecSadly.  I kind of liked not being on the hook for releases. :-)16:59
bnemectrown: True. :-(16:59
slaglewhat bits need flipping?16:59
slaglebesides most of them, in a general sense17:00
trown:)17:00
d0ugalslagle: The bit that allows me to relase17:00
bnemecslagle: d0ugal needs a release of tripleoclient.17:00
bnemecSpecifically liberty, although we should go ahead and release master too.17:00
d0ugalthrash: FYI ^17:00
d0ugalbnemec: Yeah, we've never released tripleoclient.17:00
*** fgimenez has quit IRC17:00
slaglei thought i gave them those perms, let me check17:00
d0ugalMaybe I just need to follow the wiki17:01
derekhSo the other error I'm seeing a lot is timeouts during the overcloud deploy, will start looking into that now17:01
*** fgimenez has joined #tripleo17:01
*** fgimenez has quit IRC17:01
*** fgimenez has joined #tripleo17:01
bnemecderekh: I have a patch up to help with that too: https://review.openstack.org/27371617:02
bnemecIt won't fix them, but at least we'll be able to debug properly (I hope).17:02
derekhI'm pretty sure we need this too, without it we get OOM errors all over the place on the overcloud https://review.openstack.org/#/c/273431/517:02
bnemecOr slagle's swap change.  Or both. :-)17:03
bnemecI thought 1 heat engine would cause messaging timeouts :-/17:03
*** dprince has joined #tripleo17:03
derekhbnemec: only on deeply nested stacks17:04
derekhbnemec: on the overcloud we are testing a simple stack17:04
bnemecDo we not qualify? :-)17:04
*** cwolferh has joined #tripleo17:04
bnemecAh, gotcha.17:04
derekhbnemec: I'm pretty sure we would qualify if it was the undercloud17:04
slagled0ugal: you're not added to do releases, so i'll just release it real quick17:05
derekhbnemec: I think thats the wrong timeout17:05
slagled0ugal: so master will become 1.0.017:05
slagleand i'll release stable/liberty as 0.1.017:05
derekhbnemec: let me see if I can confirm17:05
bnemecderekh: It doesn't take long to actually create the stack does it?  It's just waiting for the instance to finish booting that's slower on fedora.17:05
bnemecThat was my reasoning anyway.17:05
d0ugalslagle: thrash had suggested 0.1.0 for master and 0.0.11 for liberty17:06
d0ugalslagle: I guess because we have a 0.0.10 tag already17:06
d0ugalbut I don't mind either way17:06
thrashd0ugal: I'm good with 1.0.0 and 0.1.017:07
d0ugalslagle: ^17:07
slagleyea, already done :)17:07
thrashsince 0.0.10 was for kilo17:07
thrash:)17:07
d0ugalhah17:07
slagleliberty is 0.1.117:07
derekhbnemec: Sorry, I thought you did something completly different, ignore me17:08
d0ugalslagle: Thanks!17:08
bnemecderekh: Oh, I just realized you weren't talking about the fedora patch. :-)17:08
derekhbnemec: I was wrong anyways17:08
*** akuznetsov has joined #tripleo17:08
bnemecderekh: We might want to set other timeouts too.  I think this will help with the majority of the problems we're hitting though.17:08
derekhbnemec: ack17:09
slagled0ugal: thrash : it's live, http://tarballs.openstack.org/python-tripleoclient/17:09
d0ugalslagle: Awesome. Thank you.17:09
slaglederekh: bnemec : fwiw, this HA job appears to be hung, https://jenkins03.openstack.org/job/gate-tripleo-ci-f22-ha/346/console17:11
derekhbnemec: your using a F21 cloud image, wanna bump it to F23 ?17:12
bnemecYeah, we hit that quite a bit: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=build_name:%20*tripleo-ci*%20AND%20build_status:%20FAILURE%20AND%20message:%20\%22GATE_RETVAL%3D137\%2217:12
devvesamarios, shardy: Sorry, I don't know what's wrong with this patch: https://review.openstack.org/#/c/260413/17:12
slaglei dont know if there's any way to investigate that17:12
devvesacan I help in something?17:12
bnemecderekh: No, I did that on purpose.  It's the exact image we were using for our downstream CI, and it worked well there.17:12
bnemecIt's also smaller than the newer fedora images for some reason.17:12
bnemecslagle: That's my overcloud timeout patch.  Right now we don't even get logs from hung runs.17:13
bnemecETOOMANYPROBLEMS17:13
* derekh should have read the commit message17:13
*** akuznetsov has quit IRC17:13
bnemec:-)17:13
derekhslagle: I'm going to log into that instance and see if I can get some logs before it times out17:15
openstackgerritBen Nemec proposed openstack-infra/tripleo-ci: Set overcloud deploy timeout  https://review.openstack.org/27371617:16
*** dprince has quit IRC17:16
bnemecI realized this morning that the comment on the first version of ^ was completely incomprehensible. :-)17:16
*** absubram has quit IRC17:17
bnemecdevvesa: I think we're mostly just trying to figure out the right combination of configurations that will get the ping test working consistently.17:17
*** dprince has joined #tripleo17:17
*** absubram has joined #tripleo17:17
derekhslagle: | fdb7e332-d1d9-4260-8032-face713b2d8c | overcloud  | CREATE_COMPLETE | 2016-01-29T15:43:36 | None         |17:19
*** bvandenh has quit IRC17:21
openstackgerritBen Nemec proposed openstack-infra/tripleo-ci: Enable undercloud ssl on nonha job  https://review.openstack.org/27374317:22
derekhslagle: bnemec so the overcloud deploy command looks like its missed the fact that the Heat stack has completed http://paste.openstack.org/show/485459/17:22
slaglederekh: ok, i've seen this locally before17:22
*** tiswanso has quit IRC17:22
slaglethe stack is create complete, but triopleoclient must not think so for some reason17:22
*** tiswanso has joined #tripleo17:23
bnemecThat's strange.  I wonder if one of its heat calls hung or something.17:24
derekh[root@instack ~]# netstat -pn | grep -i 1957817:25
derekhtcp        1      0 192.0.2.1:60506         192.0.2.1:9292          CLOSE_WAIT  19578/python217:25
derekhtcp        1      0 192.0.2.1:44483         192.0.2.1:8774          CLOSE_WAIT  19578/python217:25
derekhstrace is showing lots of calls to heat17:29
*** fgimenez has quit IRC17:30
derekhhttp://paste.openstack.org/show/485461/17:30
devvesabnemec: Thanks for the info!17:30
*** xinwu has joined #tripleo17:32
*** bvandenh has joined #tripleo17:34
*** pblaho has quit IRC17:34
*** yamahata has quit IRC17:35
derekhbnemec: slagle instance is gone, best I could get is a2 minutes of strace from the deploy command, its not stuck, maybe in a loop17:41
derekhhttp://goodsquishy.com/downloads/strace.log17:41
*** ccrouch1 has quit IRC17:43
*** rwsu has quit IRC17:46
*** rwsu has joined #tripleo17:46
*** ccrouch has joined #tripleo17:46
*** ccrouch has joined #tripleo17:46
*** gfidente has quit IRC17:48
*** jistr has quit IRC17:49
ayoungderekh, hey, I saw your commit for retry on the ping test17:54
*** paramite is now known as paramite|afk17:54
ayoungderekh, I was able to reproduce the error on a local machine, I don't think retry is going to help.17:54
derekhayoung: I was just kinda test to see what happen, we thing we found the problem earlier ^^ , looks like switching to the fedora cloud image should help us17:56
ayoungderekh, is it a memory constraint issue?17:56
*** yamahata has joined #tripleo17:57
derekhayoung: nope, the cirros image is failing to setup a timer on boot17:58
derekhayoung: we also have a memory issue17:58
derekhayoung: https://review.openstack.org/#/c/273431/17:59
trownderekh: what is the timer on boot thing?17:59
trownthe fedora ping test is not working for me17:59
derekhtrown: this is the console of the instance when its booting http://paste.openstack.org/show/485454/18:00
derekhtrown: its stalls there18:00
derekhtrown: slagle and bnemec seem to have seen it before18:00
* derekh has gotta run, will check back later18:01
*** derekh has quit IRC18:01
trownhmm, I have not seen that. I wonder what is different in CI18:01
*** paramite|afk is now known as paramite18:01
*** bvandenh has quit IRC18:04
slagletrown: it's a known issue with cirros18:04
slagleit could be anything really, environmental18:05
slaglekernel versions, load, etc18:05
trownslagle: have you had success with the fedora based pingtest?18:05
slagletrown: i'm not running the exact same code18:06
slagleit works for me, but i'm using network isolation18:06
slagleso i create the overcloud network slightly differently18:06
slagledunno if that matters18:06
trownhmm... I would think if anything that would have a lower chance of success18:07
slaglewhy?18:07
trownjust more complicated18:07
slagleoh. it works fine18:07
slagle"for me" :)18:07
trownwith the same flavor that the tripleo.sh code makes?18:08
slagledo your oc nodes already have >4gb ram?18:08
*** paramite has quit IRC18:09
*** Marga_ has joined #tripleo18:09
trownnope, they have exactly 418:10
slagletrown: oh snap, we probably need to make that flavor bigger18:10
*** Marga_ has quit IRC18:11
*** Marga_ has joined #tripleo18:11
bnemecslagle: Wasn't the m1.demo flavor created solely for the purpose of booting this image?18:12
slaglebnemec: probably. m1.demo wfm18:13
slagletrown: ^18:13
*** absubram has quit IRC18:13
trownk, rerunning even the cirros test against the same cloud it passed on a moment ago is failing, so maybe both are still flaky18:14
* bnemec fires up an environment so he can get in on this testing fun :-)18:15
slaglebnemec: we will likely have to merge https://review.openstack.org/#/c/273699/ and https://review.openstack.org/#/c/273431/ together18:15
slagleshould we depends-on one or the other?18:15
trownI am running with the shardy single worker overcloud patch too fwiw18:15
bnemecslagle: If we want to test them in the same job or are worried about them not getting merged at the same time.18:18
*** xinwu has quit IRC18:18
trownslagle: bnemec, is `nova console-log` working for either of you? I get nothing for either fedora/cirros18:20
trownactually scratch that it works for cirros... I did not git reset --hard enough18:22
bnemecgit reset harder!18:22
bnemecOr --harder18:22
trowncirros ping is working for me while fedora is not, and I do not even get console logs from fedora18:22
slagleboth of those work for me, but i don't have shardy's patch18:24
bnemecI don't actually test my changes.  I follow the "push and pray" methodology. :-)18:24
bnemecAlso, I've never been able to reproduce any of these pingtest problems locally, so it hasn't been terribly useful in the past.18:25
trownI have had pretty mixed success with the pingtest in rdoci... but I also have had trouble reproducing locally18:26
ayoungderekh, is there a review out there for replacing cirros with fedora cloud image for the ping test?  I can try it out18:26
trownayoung: https://review.openstack.org/#/c/273699/18:26
slagleLOL18:26
trownslagle: are you laughing at me? I have not actually had hard enough time, since we have another one on the way :p18:27
*** penick has joined #tripleo18:27
slagleno i was laughing at ayoung18:27
slagle:)18:27
ayoungslagle, so, in addition to this, there is a Keystone midcycle going on that I am attending remotely (you can guess how well that worked) and I had an insurance inspector show up at my door, meanwhile hl;eping a teammate get ready for presenting on ourstuff at devconf.  Where am I again?18:29
trownredeploying without the single worker environment18:29
ayoungslagle, I suspect that memory issues are also hurting my setup. I'm tempted to teardown the undercloud on this setup and restore with larger vms18:30
slagleayoung: sorry, it just made me chuckle since we were talking about the patch18:30
ayoungslagle, no worries.  If I had trouble with people laughing at me, I wouldn't be able to get my work done18:30
ayoungI'm just hoping to be able to make a constructive contriubution here18:30
*** ccrouch1 has joined #tripleo18:32
bnemecDo we want to just go ahead and merge https://review.openstack.org/#/c/273701/ ?18:34
bnemecIt's passed 2 of the 3 test jobs, and it pretty clearly didn't break the ceph job.18:34
bnemecAnd it should be helpful debugging all of the other changes.18:34
*** ccrouch has quit IRC18:35
*** rasca has quit IRC18:35
slaglewfm.18:35
slagleno one else is around, so i guess we're in charge :)18:36
*** rasca has joined #tripleo18:36
* bnemec merges all the things18:36
bnemecThe shotgun approach to debugging. :-)18:37
*** nico_auv has quit IRC18:38
*** jhenner has quit IRC18:38
*** jhenner has joined #tripleo18:38
*** absubram has joined #tripleo18:38
*** sthillma has joined #tripleo18:39
*** jhenner has quit IRC18:39
*** jhenner has joined #tripleo18:39
*** jhenner has quit IRC18:40
*** shivrao has joined #tripleo18:43
thrash+2 from me. :)18:44
trownthrash: to shotguns or the patch :p18:45
thrashthe patch18:45
thrashmaybe the shotguns too... Depends how desperate we are.18:45
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: Minimise memory usage for deployed overcloud  https://review.openstack.org/27343118:50
slaglebnemec: i added a depends-on here ^ on the fedora cloud image one18:50
openstackgerritMerged openstack/tripleo-common: Output some debug info when pingtest fails  https://review.openstack.org/27370118:50
trownthat ceph job looks like it timed out during the ceph part of deployment18:50
slagleif it passes, we can merge them both18:51
openstackgerritEmilien Macchi proposed openstack/tripleo-docs: First documentation for Operational tools  https://review.openstack.org/26527118:52
openstackgerritEmilien Macchi proposed openstack/instack-undercloud: puppet: fix uchiwa password parameter  https://review.openstack.org/27420218:55
EmilienMdprince: a quick one ^18:55
*** alop has joined #tripleo18:56
dprinceEmilienM: quick review, but we wait on the CI18:56
EmilienMno prob18:56
dprincewhich won't even test this yet18:56
*** pcaruana has joined #tripleo18:57
*** xinwu has joined #tripleo18:57
trownslagle: bnemec, just redeployed without the single worker env and fedora ping test works18:59
slaglei see18:59
trownnot really sure how having a single worker would affect that18:59
slagleguess i'll try it now :)18:59
slaglei added a depends on so those 2 patches run together in ci, so we'll see there as well19:00
trownya I will be interested if we get the same thing with nova console-log not returning anything19:00
slaglegoodbye overcloud, you were a good one19:01
slagleyou lasted about 4 hours19:01
*** tosky has quit IRC19:02
*** devvesa has quit IRC19:04
*** akuznetsov has joined #tripleo19:08
*** akuznetsov has quit IRC19:13
slagletrown: were you testing ha?19:14
trownslagle: ya19:16
* bnemec toddles off to eat while his overcloud deploys19:16
trownha is the only thing that is flaky for me... I am starting think it is just not possible to test ha on a 32G host19:16
slagle32g ought to be enough19:19
slaglethe problem i run into with ha is having only a 4 core box19:19
slaglewith 5 vm's, plus the host, it's pretty much unusable19:20
trownhmm, maybe that is more of my issue then19:20
*** absubram has quit IRC19:20
trownnot sure what the centosci boxes have, but I bet it is not more than my dell mini, which is 4 cores19:20
*** mkovacik has joined #tripleo19:22
trownI get OOM alot though testing ha19:24
*** pcaruana has quit IRC19:24
slagleoh aren't you giving the uc a lot more ram in rdoci?19:25
trown12G for the ha job and 16 for the nonha, but I get OOM on the overcloud nodes19:29
*** rbrady has quit IRC19:36
*** pcaruana has joined #tripleo19:37
openstackgerritgreghaynes proposed openstack/diskimage-builder: Perform a booting test for our images  https://review.openstack.org/20463919:37
*** jistr has joined #tripleo19:37
slaglei believe i see the issue with the fedora image patch :)19:39
*** ccrouch1 has quit IRC19:39
slaglehtml documents don't make very good images19:39
*** electrofelix has quit IRC19:40
slaglethat url is redirecting19:40
bnemecAh.19:40
bnemecI just deleted my overcloud instead of the tenant stack again.19:41
bnemecIt's a real problem having that muscle memory for heat stack-delete overcloud.19:41
slaglei almost did that earlier, but luckily i had overcloudrc sourced19:41
*** ccrouch has joined #tripleo19:41
*** eil397 has joined #tripleo19:41
trownslagle: odd... sometimes it doesnt, but ya.. size=344   is suspect for sure :)19:41
*** thrash is now known as thrash|biab19:41
openstackgerritJames Slagle proposed openstack/tripleo-common: Use Fedora image for ping test  https://review.openstack.org/27369919:42
slagleyep :)19:42
bnemecOne a semi-related note, we need to change the default for that heatclient env var.19:43
trownthat also then makes much more sense why we get no console-log19:43
bnemecI keep forgetting to set it and end up with an orphaned stack and network because the ping dies.19:43
trownah ya... I am supposed to file a heatclient bug for that as well19:44
trownkind of surprising to me that `nova boot` returns success to heat when booting a 344 byte html page19:46
*** zaneb has joined #tripleo19:46
*** olap has joined #tripleo19:50
*** ccrouch1 has joined #tripleo19:53
trownretrying the fedora ping test with the single worker env19:54
*** ccrouch has quit IRC19:55
*** mkovacik_ has joined #tripleo20:00
bnemecIt's just as well.  OVB wouldn't work at all if it verified that the image was bootable.20:03
bnemecWe start the baremetal nodes out with a completely empty qcow2. :-)20:03
*** mkovacik has quit IRC20:04
openstackgerritSteven Hardy proposed openstack/tripleo-docs: Document using node capabilities to control placement  https://review.openstack.org/27421720:05
*** thrash|biab is now known as thrash20:05
*** pcaruana has quit IRC20:06
openstackgerritMatthew Thode proposed openstack/diskimage-builder: add new cloud-init element  https://review.openstack.org/27376420:07
openstackgerritSteven Hardy proposed openstack/tripleo-docs: Document using node capabilities to control placement  https://review.openstack.org/27421720:07
openstackgerritMatthew Thode proposed openstack/diskimage-builder: add gentoo support to growroot  https://review.openstack.org/27376920:09
*** pcaruana has joined #tripleo20:09
trownbnemec: slagle, confirmed the updated fedora image test worked with the single worker env20:14
*** ccrouch1 has quit IRC20:15
bnemecOh, and I totally typoed the filename too.  .qcow instead of .qcow2.20:16
openstackgerritBen Nemec proposed openstack/tripleo-common: Use Fedora image for ping test  https://review.openstack.org/27369920:17
*** shardy has quit IRC20:18
trownah hehe, yep glance didnt care :)20:18
openstackgerritJames Slagle proposed openstack/tripleo-common: Use Fedora image for ping test  https://review.openstack.org/27369920:19
bnemecI wondered how the upload to glance was going so fast with a 150 mB image. :-)20:19
*** pcaruana has quit IRC20:19
slagleit needed a rebase20:19
slagleapparently20:19
bnemectripleo.sh -- Overcloud pingtest, SUCCESS \o/20:20
*** ccrouch has joined #tripleo20:22
*** Marga_ has quit IRC20:23
*** penick has quit IRC20:24
*** panda_ has quit IRC20:26
*** derekh has joined #tripleo20:26
*** panda_ has joined #tripleo20:27
derekhCI isn't running on shadys workers=1 patch, as the patch it depends on changed, since its gotta be restarted anyways I'm gonna fix the comment20:28
*** jcoufal has joined #tripleo20:28
*** jcoufal has quit IRC20:28
openstackgerritDerek Higgins proposed openstack-infra/tripleo-ci: Minimise memory usage for deployed overcloud  https://review.openstack.org/27343120:30
openstackgerritBen Nemec proposed openstack/instack-undercloud: Remove option of installing tuskar  https://review.openstack.org/26538320:32
openstackgerritBen Nemec proposed openstack/instack-undercloud: Deploy Aodh services, replacing Ceilometer Alarm  https://review.openstack.org/26538220:32
openstackgerritBen Nemec proposed openstack/instack-undercloud: Switch default keystone auth_uri to v3  https://review.openstack.org/26538020:32
openstackgerritBen Nemec proposed openstack/instack-undercloud: Add check for sufficient memory to undercloud install  https://review.openstack.org/26537820:32
openstackgerritBen Nemec proposed openstack/instack-undercloud: Set Nova's ram_allocation_ratio configuration option to "1.0" By default  https://review.openstack.org/26537720:32
*** mkovacik_ has quit IRC20:38
*** Goneri has quit IRC20:43
*** mkovacik has joined #tripleo20:45
ayoungis anyone else getting the error with the openstack cli that doing list command where there are no results reports and out of range error?20:45
bnemecCrap, now our pinned CentOS mirror is not cooperating. :-(20:46
bnemecWhy does nothing work?!20:47
EmilienMit's friday I guess20:47
EmilienMit's a sign you should stop :-)20:47
bnemecayoung: What list command?20:47
ryansball the servers went home for the weekend ;)20:47
bnemecEmilienM: Sounds good. :-)20:48
EmilienMno stay here, we have some work :-P20:48
*** derekh has quit IRC20:51
ayoungbnemec, openstack server list20:52
ayoungbnemec, or any other list command20:52
ayoung$ . ./overcloudrc20:52
ayoung[stack@instack ~]$ openstack server list20:52
ayounglist index out of range20:52
bnemecayoung: Ah, yes.  I'm seeing the same thing.20:52
bnemecSounds like an OSC bug.20:53
ayoungbnemec, yep, and it is a newish one20:53
ayoungI suspect the problem is cliff20:53
prometheanfirecan someone who knows the tripleo gate let me know if this is my fault?  I don't think it is... (gate failure) https://review.openstack.org/27376920:54
dhellmannayoung : add --debug after openstack and before the rest and you should get a traceback20:54
ayoungdhellmann, I did, it was cliff ish20:54
ayoungI'l paste20:54
prometheanfireah, probably the rpb thing still20:54
ayoungdhellmann, http://paste.openstack.org/show/485491/20:55
prometheanfirederekh said he was going to release the fix20:55
prometheanfirehttps://review.gerrithub.io/#/c/261404/20:55
dhellmannayoung : yeah, that looks like a bug in the auto-width stuff that was added recently. file a bug for us?20:57
* bnemec notes that dhellmann must have an IRC watch on cliff :-)20:57
dhellmannbnemec : aye20:57
ayoungdhellmann, was just making sure there was not one already20:57
dhellmannayoung : cool, thanks20:57
ayoungdhellmann, cliff or OSC?20:57
dhellmannayoung : cliff, I think the issue is that the result set is empty so the table is coming back blank, then we're trying to figure out how wide it is20:57
ayoungOK...is that launchpad as well>?20:58
bnemecprometheanfire: It looks like our pinned mirror is having issues.  It's not your patch.20:58
dhellmannayoung : https://launchpad.net/python-cliff20:58
ayoungyep20:58
ayoungdhellmann, https://bugs.launchpad.net/python-cliff/+bug/153977020:59
openstackLaunchpad bug 1539770 in cliff "Empy set causing out of range error" [Undecided,New]20:59
prometheanfirebnemec: neat20:59
dhellmannayoung : thanks20:59
dhellmannayoung : is there any way for you to provide the data set that's being fed into the formatter, to verify my hypothesis? maybe use a different formatter on the command line, like json?21:01
dhellmannor csv or something21:01
bnemecNow the docs job is failing on the fedora image change.21:01
trownwtf21:02
bnemecNot that it matters since apparently the mirror is dead.21:03
bnemecAt least everything is failing fast.21:03
bnemecI've gotta take my silver linings where I can get them right now.21:03
ayoungdhellmann, sure...21:03
ayoungdhellmann, I just missed grabbing it from the output.  Added to the bug report21:04
dhellmannayoung : thanks!21:05
ayoungbnemec, pingtest failed for me with a cherrypick of the fedoraimage change:21:05
ayoung| server1               | 585369f2-7a6e-4eba-be5d-708f2d0e4e46 | ResourceInError: resources.server1: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"                         | CREATE_FAILED      | 2016-01-29T20:47:10 |21:05
bnemecayoung: Did you deploy with --libvirt-type qemu?21:06
ayoungbnemec, not explicitly.21:06
ayoungbnemec, where do I make that change>?21:06
bnemecayoung: You need to if you're going to test in a virt environment.21:06
bnemecayoung: Are you using tripleo.sh for the overcloud deploy?21:06
ayoungbnemec, not quite21:07
ayoungbnemec, I ran21:07
ayoung openstack overcloud deploy --template /home/stack/tripleo-heat-templates/21:07
ayoungbecause I was testing the Keystone  HTTPD change21:07
bnemecayoung: Okay, just add --libvirt-type qemu to the end of that.21:07
ayoungk21:07
bnemecopenstack overcloud deploy --template /home/stack/tripleo-heat-templates/ --libvirt-type qemu21:07
ayounglet me make sure keystone worked...21:07
ayoungof course it did...duh21:08
ayoungthe commands I showed above were run against it21:08
ayoungok...let me teardown and retry21:08
*** jistr has quit IRC21:10
openstackgerritBen Nemec proposed openstack-infra/tripleo-ci: Pin to mirror.centos.org  https://review.openstack.org/27424121:11
*** olap has quit IRC21:11
bnemec^ at least has a chance of passing CI.  Maybe.21:12
bnemecIt's got the rack to itself, so it should be super fast. :-)21:13
prometheanfirebnemec: I know it's being worked on, but mind letting me know when I can run recheck?21:15
bnemecprometheanfire: I wouldn't count on it happening before next week. :-(21:16
bnemecOur CI pinned mirror is down, which is blocking testing of all the other patches.21:16
bnemecBecause of course it is.21:16
prometheanfirelesigh21:20
prometheanfirewas really hoping this wouldn't get stuck in the mire that is openstack gateing21:21
bnemecHmm, apparently we're still defaulting to the centos 7.0 image.  That's gotta make our image builds take longer.21:22
trownbnemec: not sure it would matter, we are upgrading what feels like every package even on the 7.1 image21:23
bnemecActually, I'm confused now.  We're pointing at the current image, but the size of file we download doesn't match what I'm seeing. :-/21:25
bnemecOh.21:25
* bnemec can't read21:25
bnemecNever mind.  Complete false alarm.21:26
trownhehe21:26
trownat least its friday21:26
ayoungbnemec, Ummmm not sure if this is a step forward but:  http://paste.openstack.org/show/485495/21:31
bnemecayoung: Yeah, we need to change the default.  Let me find the variable you need to set to fix that.21:32
ayoungbnemec, but  I can ping the vm....so....yes!21:32
bnemecayoung: export OVERCLOUD_PINGTEST_OLD_HEATCLIENT=021:32
bnemecHeat client made a change that broke us.21:32
bnemectrown: Can we just change the default on that since we changed the CI pin so by default we're deploying a client that doesn't work with the current default?21:33
bnemecDefault default default21:33
trownbnemec: ya probably better to have it default to working on mitaka21:33
trownbnemec: but what works on mitaka will not work on liberty and vice versa21:33
bnemecUgh.21:34
trowny21:34
*** rcernin has joined #tripleo21:36
*** jayg is now known as jayg|g0n321:37
*** absubram has joined #tripleo21:41
trownbnemec: filed a heatclient bug for it, though I think it might be a cliff thing https://bugs.launchpad.net/python-heatclient/+bug/153978321:43
openstackLaunchpad bug 1539783 in python-heatclient "backwards-incompatible change to raw format output" [Undecided,New]21:43
openstackgerritBen Nemec proposed openstack/tripleo-common: Detect when we need the alternate heat command  https://review.openstack.org/27426321:50
bnemectrown: ^21:50
bnemecSeems to be working for me locally on the new heat client.21:50
trownnice21:50
trownthat is way better21:50
bnemecError: Could not find dependency File[/etc/httpd/conf/ports.conf] for File[/etc/httpd/conf/httpd.conf] at /etc/puppet/modules/apache/manifests/init.pp:34921:51
bnemecOMFG21:51
bnemecI give up.21:51
*** mkovacik has quit IRC21:53
*** derekh has joined #tripleo21:53
slaglewut21:53
bnemecMy change to switch our pinned centos mirror just failed on that during the undercloud install.21:54
ayoungbnemec, soooo can't run yet21:55
ayoungUnable to create the flat network. Physical network datacentre is in use.21:55
ayoungI should just tear that down by hand?21:56
ayoungstave delete did not clean it up21:56
bnemecayoung: Yeah, you'll need to delete all of the stuff it had created.21:56
bnemecUnfortunately that failure causes it to skip cleanup too.21:56
ayoungERROR: Property error: : resources.server1.properties.image: : Multiple physical resources were found with name (pingtest_image).21:57
ayounghmmmm21:57
bnemecayoung: Delete by UUID.21:57
ayoungimage?21:57
bnemecYou probably have two pingtest_image's in glance.21:57
ayoungit still seems to be running...21:57
ayoungkill it and delete?21:58
bnemecayoung: I don't like its odds of completing, so I'd say yes.21:58
ayoungbnemec, should I delete the flavor too?22:00
bnemecayoung: No22:00
bnemecThat was actually created as part of the original deployment.22:01
ayoungOK.22:01
ayoungimage is gone. network is gone.  Anything else?22:01
*** pradk has quit IRC22:01
bnemecayoung: Heat stack?22:01
ayoungdead22:01
ayoungI killed it all good22:01
bnemecayoung: Heat, glance, and neutron are the only things we clean up from the pingtest, so that should do it.22:02
ayoungpinging the floating ips....22:02
ayounghasn;'t died yet22:02
ayoungcleanup22:02
ayoungbnemec, success.22:05
ayoungbnemec, so, is there something I can do to move this along?  This is all awesome22:05
bnemec\o/ Something is actually working for somebody.22:05
bnemecayoung: I have no idea.  We have so many problems stacked up that are blocking CI that I have no idea when we'll be back in business.22:06
bnemecThe latest one seems to be that puppet-apache has broken us.22:06
openstackgerritxin wu proposed openstack/tripleo-heat-templates: Add extra config yaml files for big switch agents.  https://review.openstack.org/27192222:07
ayoungbnemec, the hacks I needed were the fedora image, setting up with libvirt and the env var to get around the IP test22:07
ayoungok...I'm going to go be a dad now22:07
bnemecayoung: Have a good weekend22:07
openstackgerritxin wu proposed openstack/tripleo-heat-templates: Include big switch puppet modules for deploying overcloud  https://review.openstack.org/27194022:09
*** rlandy has quit IRC22:10
openstackgerritxin wu proposed openstack/tripleo-heat-templates: Include big switch puppet modules for deploying overcloud  https://review.openstack.org/27195322:10
*** Marga_ has joined #tripleo22:12
*** dprince has quit IRC22:12
bnemecWarning: Scope(Concat::Fragment[Listen 80]): The $ensure parameter to concat::fragment is deprecated and has no effect.22:25
bnemecIt's not deprecated if it already has no effect.22:25
derekhbnemec: fact22:26
bnemecYep, the concat module just released.  99% sure that's what broke us: https://github.com/puppetlabs/puppetlabs-concat/commit/55cf9354bb02635e3a54d96cf22c4031751ef67a22:27
*** sbalukoff has quit IRC22:29
derekhbnemec: are you gonna pin it22:32
bnemecderekh: Yeah, just working up the change now.22:33
*** trown is now known as trown|outttypeww22:33
derekhbnemec: ok, the the top of the chain passes we can probbaly just merge all 4? changes22:33
bnemecderekh: I don't even know how many changes are needed.  There are so many things wrong now that I've lost track. :-/22:34
derekhbnemec: If they havn't been merged tonight I'll get them merged in the morning22:34
openstackgerritBen Nemec proposed openstack/tripleo-common: Pin puppet-concat  https://review.openstack.org/27427822:34
*** jdob has quit IRC22:35
bnemecThere's a reason we don't do releases on Friday.22:36
* bnemec remembers that we just released tripleoclient earlier today22:37
bnemecThat hadn't previously been released though, so it wasn't going to break anyone.22:38
derekhbnemec: lets merge the centos mirror switch  https://review.openstack.org/#/c/274241/122:40
derekhbnemec: looks like its getting a lot further22:40
bnemecderekh: Yeah, it at least got further than the other patches.22:40
derekhbnemec: yup, merging22:40
openstackgerritMerged openstack-infra/tripleo-ci: Pin to mirror.centos.org  https://review.openstack.org/27424122:41
bnemecThe first run was a little slow, but once squid has cached all the things it should be better.22:41
*** thrash is now known as thrash|bbl22:41
bnemecI guess I can verify the concat change locally.  It'll probably be faster than CI since my VM is already up and running.22:43
derekhbnemec: ya, I think this may have been better http://mirrors.usc.edu/pub/linux/distributions/centos22:44
derekhbnemec: based on some pings22:44
bnemecderekh: Yeah, I don't know that I have access to the rack so I couldn't really tell what was fastest from there.22:44
derekhbnemec: but as long as the proxy is doing its job it should make little difference22:44
bnemecThat's what I figured.22:45
prometheanfirederekh: hi22:45
derekhbnemec: yup, add yourself to the list to become and admin ;-)22:45
bnemecMaybe we should add a ping to the mirror setting part of the script and fall back to other mirrors if one of them goes down.22:45
prometheanfirederekh: was https://review.gerrithub.io/#/c/261404/ ever finished?22:45
derekhbnemec: good plan22:45
bnemecDamn, it still failed.  Hopefully I just screwed up the override locally...22:46
derekhprometheanfire: the packaging guys want to see a core review on you DIB patch first, so its less likely it will majorly change22:47
prometheanfirederekh: so my dib patch has to merge first?22:48
prometheanfireit can't merge becaues it can't pass without that patch22:49
derekhprometheanfire: no it doesn't, just needs some reviews,22:49
derekhthere happy to merge the packaging change first22:50
derekhjust want to make it more likly they merge the correct thing,22:50
prometheanfireya, it's this one https://review.openstack.org/#/c/273769/22:50
prometheanfireright?22:50
derekhprometheanfire: that looks like it, was that part of a bigger patch yesterday?22:52
prometheanfireyes, I split it22:52
prometheanfireit's going to be hard to get core reviers to look at it if it's failing22:52
prometheanfiredunno why it's being made so hard22:52
openstackgerritBen Nemec proposed openstack/tripleo-common: Pin puppet-concat  https://review.openstack.org/27427822:52
bnemecDammit, the pin was wrong.22:53
prometheanfireso, dib cores, can you look at https://review.openstack.org/#/c/273769 please?22:53
bnemecOkay, this version looks like it will work.  It got past the previous error in my local run.22:54
derekhprometheanfire: added myself to the review so it will be in my face when I come in next week,22:55
bnemecMan, there isn't even a deprecation warning in the previous version of concat.22:55
prometheanfirederekh: thanks, I don't know why this feels like it's moving so slow :(22:55
*** sbalukoff has joined #tripleo22:57
derekhprometheanfire: unfortunately there is a fairly big backlog of reviews in tripleo land, we need to make things better22:57
prometheanfireah, :(22:57
derekhprometheanfire: at them moment trunk is broken anyways so were trying to get that back on track22:58
prometheanfirelol22:58
derekhbnemec: +2, I made the exact same mistack myself last week pinning something else23:00
derekhbnemec: if you've tested locally should we push it through and recheck the other tests, so we have a chance of getting stuff back on track before the next thing crops up?23:01
bnemecderekh: Yeah, it worked for me, and it can't possibly make things worse. :-)23:02
derekhbnemec: don't, I used that line in my comment :-)23:03
*** penick has joined #tripleo23:03
derekh*done23:03
bnemecWhat?!  This was reported to them yesterday and they merged it anyway: https://tickets.puppetlabs.com/browse/MODULES-301823:03
bnemecFFS23:03
* bnemec needs a table to flip23:04
bnemecMaybe several23:04
openstackgerritMerged openstack/tripleo-common: Pin puppet-concat  https://review.openstack.org/27427823:04
* derekh shakes his fist 23:05
derekhbnemec: ok, I rechecked them, will take a look in the morning and see how things have moved on23:07
bnemecderekh: Sounds good, thanks.23:07
derekhbnemec: and thank you23:08
derekhhave a good weekend all23:08
*** derekh has quit IRC23:08
openstackgerritBen Nemec proposed openstack-infra/tripleo-ci: Provide fallback mirrors for centos pin  https://review.openstack.org/27428723:17
bnemecAnd with that, I need a drink.23:18
bnemecMaybe several23:18
*** egafford has quit IRC23:41
openstackgerritDerek Higgins proposed openstack-infra/tripleo-ci: Minimise memory usage for deployed overcloud  https://review.openstack.org/27343123:50

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!