20:00:47 <lifeless> #startmeeting tripleo 20:00:48 <openstack> Meeting started Mon Jun 17 20:00:47 2013 UTC. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:50 <lifeless> hi everyone 20:00:51 <openstack> The meeting name has been set to 'tripleo' 20:00:59 <dprince> lifeless: hi 20:01:00 <dkehn> hi 20:01:05 <NobodyCam> morning lifeless 20:01:07 <GheRivero> hi all 20:01:49 <lifeless> #topic agenda 20:01:50 <lifeless> bugs 20:01:51 <lifeless> Grizzly test rack status 20:01:51 <lifeless> CI virtualized testing progress 20:01:51 <lifeless> open discussion 20:01:55 <lifeless> #topic bugs 20:02:07 <lifeless> https://bugs.launchpad.net/tripleo/ as usual 20:02:19 <lifeless> we're down to 7 crits 20:02:47 <lifeless> SpamapS: you have 3 of them 20:02:57 <lifeless> SpamapS: care to give a brief status on tehm? 20:03:56 <dprince> lifeless: I think we might be seeing a lot of this w/ TOCI https://bugs.launchpad.net/tripleo/+bug/1166838 20:03:57 <SpamapS> sure let me catch up 20:03:58 <uvirtbot> Launchpad bug 1166838 in tripleo "rabbitmq does not start correctly on boot" [High,Triaged] 20:04:14 <lifeless> ok while SpamapS catches up 20:04:20 <SpamapS> https://bugs.launchpad.net/heat/+bug/1191931 20:04:21 <uvirtbot> Launchpad bug 1191931 in heat "AssertionError when creating a stack." [Critical,New] 20:04:23 <lifeless> I should get https://bugs.launchpad.net/tripleo/+bug/1191714 done today 20:04:27 <uvirtbot> Launchpad bug 1191714 in tripleo "400.Bad.Request..X-Instance-ID.header.is.mising.from.reque " [Critical,Triaged] 20:04:27 <SpamapS> btw, I believe this is breaking heat right now 20:04:28 <lifeless> oh, he's up :) 20:04:37 * lifeless hands the mike to SpamapS 20:05:16 <SpamapS> Hm, I haven't checked, all 3 of those might be already merged, just needing better docs. 20:05:22 <lifeless> SpamapS: ok, and thus tripleo ? 20:05:49 <SpamapS> ok no https://bugs.launchpad.net/tripleo/+bug/1182249 is still ongoing 20:05:50 <uvirtbot> Launchpad bug 1182249 in tripleo "quantum configuration is overly hardcoded" [Critical,In progress] 20:06:04 <SpamapS> that one needs os-apply-config and/or os-refresh-config to have access to the ec2 metadata 20:06:20 <lifeless> dprince: so, several things we can do - we should move stuff out of first-boot and into orc calls 20:06:40 <lifeless> dprince: which we should *anyway* as rabbit is a service we may reconfigure. 20:06:40 <SpamapS> https://bugs.launchpad.net/tripleo/+bug/1183223 is a bit vague and I may need to work on the wording/split it out 20:06:43 <uvirtbot> Launchpad bug 1183223 in tripleo "nova-compute.yaml missing parameters" [Critical,In progress] 20:07:02 <SpamapS> https://bugs.launchpad.net/tripleo/+bug/1183442 20:07:04 <uvirtbot> Launchpad bug 1183442 in tripleo "Heat metadata updates do not work" [Critical,In progress] 20:07:06 <lifeless> dprince: that would ameliorate the toci issue even if we don't diagnose the root issue in Ubunty. 20:07:09 <dprince> lifeless: sure. Just wanted to get that marked as high priority (and potentially being worked on as Derek) 20:07:11 <lifeless> Ubuntu. 20:07:17 <lifeless> dprince: ack 20:07:28 <SpamapS> I think that one is fixed actually 20:07:50 <SpamapS> will need to test and verify, but all of the code is in place to fix it theoretically 20:08:21 <dprince> SpamapS: reference commit? 20:08:26 <lifeless> SpamapS: if that heat bug is hurting us, care to add a tripleo task? 20:10:25 <SpamapS> dprince: there are several, it will take me a while to dig it out 20:10:42 <SpamapS> lifeless: yes I'm doing so. Its basically stopping us dead in the water (might be stopping all users dead) 20:10:54 <dprince> SpamapS: no worries. I can review history myself. Thanks for the update. 20:11:22 <SpamapS> dprince: it was worked around in t-i-e and recently fixed in keystoneclient for good 20:12:14 <lifeless> dprince: a8c2ae7e1506defaa36f035377af2b7b04aaed87 20:12:36 <dprince> lifeless: thanks. 20:12:48 <lifeless> we had tat listed as fixing 1183732 20:12:51 <lifeless> but 20:12:53 <lifeless> I think its teh same 20:13:46 <lifeless> ok, onto the others 20:13:57 <lifeless> bug 1184484 has provisional patches from quantum devs 20:13:58 <uvirtbot> Launchpad bug 1184484 in tripleo "Quantum default settings will cause deadlocks due to overflow of sqlalchemy_pool" [Critical,Triaged] https://launchpad.net/bugs/1184484 20:14:09 <lifeless> I tried to apply them to the HP POC environment 20:14:23 <lifeless> but it made all quantum APIs return empty responses. 20:14:27 <lifeless> Which was undesirable 20:15:06 <lifeless> so I reverted it. My plan is to get the current arc of 'get it up and working without fiddling' going, and then find a couple of spare machines in that environment and do a fresh build. 20:15:32 <mordred> o/ 20:15:34 <lifeless> quantum folk have marked https://bugs.launchpad.net/tripleo/+bug/1189385 incomplete. 20:15:36 <uvirtbot> Launchpad bug 1189385 in tripleo "quantum-server hung up it's listening port" [Critical,Triaged] 20:15:43 <lifeless> I'm going to ping them asking what they are missing 20:15:58 <lifeless> #action lifeless to chase 1189385 diagnostics for quantum devs. 20:16:17 <lifeless> bug 1188301 - I think clint was tracking it? 20:16:18 <uvirtbot> Launchpad bug 1188301 in tripleo "keystone kvs driver causes process to grow indefinitely and spin on CPU with thousands of keys in a single python dict" [Critical,Triaged] https://launchpad.net/bugs/1188301 20:16:44 <SpamapS> lifeless: did you see the respond on bug 1184484 ? 20:16:45 <uvirtbot> Launchpad bug 1184484 in tripleo "Quantum default settings will cause deadlocks due to overflow of sqlalchemy_pool" [Critical,Triaged] https://launchpad.net/bugs/1184484 20:16:46 <SpamapS> response rather ? 20:16:58 <SpamapS> lifeless: its the same old upper/lower problem again 20:17:15 <SpamapS> lifeless: oh, bug 1188301 is fixed, keystone defaults to sql now! \o/ 20:17:49 <SpamapS> https://review.openstack.org/#/c/32970/ 20:18:20 <SpamapS> I linked that to bug 1188378 though 20:18:21 <uvirtbot> Launchpad bug 1188378 in keystone "keystone.token.backends.sql uses a single delete command to flush expired tokens causing replication lag and potential deadlocks" [Medium,In progress] https://launchpad.net/bugs/1188378 20:18:35 <SpamapS> oh actually no I linked both 20:18:37 <lifeless> SpamapS: I'm confused. 20:18:44 <SpamapS> but the bot picked the other bug which I mentioned first 20:18:44 <lifeless> SpamapS: is it fixed, or is it pending review to be fixed? 20:18:53 <SpamapS> lifeless: keystone upstream defaults to sql 20:19:02 <SpamapS> lifeless: we have a config file that still says kvs though 20:19:15 <lifeless> ok, so not fixed for us, because we copied the files. 20:19:18 <SpamapS> lifeless: https://review.openstack.org/#/c/32970/ fixes that 20:19:31 <lifeless> yup 20:19:36 <lifeless> dprince: re percona toolkit 20:19:37 <SpamapS> Just need to work on that 20:19:45 <lifeless> dprince: what other options are there? 20:19:46 <jog0> lifeless: this is why I wanted to shrink the size of the config file 20:20:06 <dprince> lifeless: well... could we fix this in keystone? 20:20:06 <lifeless> jog0: yes, and as I said I'm with you in principle, we just need some care. 20:20:13 <jog0> lifeless: ++ 20:20:25 <SpamapS> lifeless: I have a WIP fix in keystone but MySQL doesn't support LIMIT in the IN() sub-query clauses .. so I have to do something mysql specific. 20:20:46 <lifeless> dprince: right now we're broken by default. I'd like to unbreak us first, and get more portable, long term fixes second. 20:20:49 <SpamapS> pt-archiver has been cleaning out our PoC table for a week now 20:20:50 <lifeless> dprince: what do you think of that ? 20:21:27 <dprince> lifeless: this will certainly break our Fedora efforts. That is my main objection. 20:21:42 <lifeless> dprince: what if SpamapS puts a 'if ubuntu' thing around the pt cleaner. 20:21:45 <SpamapS> Frankly I think we should mov to memcached eventually, but that is yet another 3rd party service to scale :p 20:21:50 <lifeless> dprince: fedora will be no worse off - broken is broken. 20:22:22 <lifeless> dprince: but it will build and install and run until you get too much contention with the upstream gc code. 20:22:34 <dprince> lifeless: go for it. I suppose I'd just like to see keystone support this with its database design. 20:22:40 <lifeless> dprince: Me too! 20:22:48 <lifeless> dprince: I just don't want to be hostage to educating them 20:22:50 <SpamapS> dprince: If you have some guidance on how to ask sqlalchemy if I'm using mysql, and then do a specially crafted query in sqlalchemy.. https://review.openstack.org/#/c/32044/ needs your comments :) 20:23:07 <dprince> lifeless: we should probably put a comment in to make note of this as well so that it doesn't confuse people, etc. 20:23:17 <lifeless> dprince: I'm not suggesting we stop caring about it, just that we get the move away from kvs in place 20:23:32 <lifeless> ok, so 20:24:15 <lifeless> #action spamaps to: - make the kvs->sql change still build and run on fedora; ensure there is a bug upstream in keystone about the bad sql behaviour, with medium priority task on tripleo. 20:24:30 <lifeless> dprince: ^ I think that meets all your concerns; if not please feel free to tweak it so that it does. 20:25:01 <lifeless> ok and bug 1191714 I am working on 20:25:02 <uvirtbot> Launchpad bug 1191714 in tripleo "400.Bad.Request..X-Instance-ID.header.is.mising.from.reque " [Critical,Triaged] https://launchpad.net/bugs/1191714 20:25:16 <lifeless> this is fallout from the overcloud changes : it's a setting that has to be different in undercloud and overcloud. 20:25:36 <lifeless> right now any seed cloud/bootstrap cloud built with tripleo will fail metadata access from instances/. 20:26:08 <lifeless> Any other bug stuff to discuss? 20:26:36 <lifeless> #topic grizzly rack status 20:26:46 <lifeless> so our POC has been getting hammered by some test users 20:26:50 <lifeless> which is great. 20:26:58 <lifeless> the only issue they have had so far is the quantum poolsize one. 20:27:25 <lifeless> With the comment spamaps pointed out, we can try switching to that quantum version again 20:27:46 <lifeless> #action lifeless to test the quantum deadlock fix on the POC again. 20:28:09 <lifeless> So, this is pretty good news, the path to near-production was a lot smoother than it might have been :) 20:28:23 <lifeless> anything else on the POC environment ? 20:28:51 <SpamapS> moar PoC racks plz 20:30:25 <lifeless> I have 2 requests to bring the tripleo love to prod racks 20:30:38 <lifeless> they know we're not finished, and caveats etc. 20:30:47 <lifeless> but they want to see how it flies. 20:31:07 <lifeless> so - thats pending other folks bandwidth. Will keep everyone apprised as things eventuate. 20:31:26 <lifeless> #topic CI virtual testing progress 20:31:30 <lifeless> pleia2: tag 20:31:34 <pleia2> hello! 20:31:48 <pleia2> so testing on lxc is moving along 20:32:19 <pleia2> got most networking stuff sorted last week and openstack using boot-stack is mostly running within lxc, just working out some launching issues with some of the services 20:32:40 <lifeless> sweet 20:32:42 <pleia2> at this point I don't foresee us hitting any major blockers 20:32:47 <lifeless> do you need more eyeballs ? 20:33:12 <pleia2> not right at this moment 20:33:16 <pleia2> but soon 20:33:41 <jog0> will the CI virt testing just put test the undercloud/boot-stack in LXC 20:33:52 <pleia2> jog0: that's the plan 20:34:12 <jog0> so no overcloud testing in first pass 20:34:34 <lifeless> jog0: the primary goal is to get baremetal code path test coverage. 20:34:58 <lifeless> jog0: with a little bit of 'image builds properly' and 'tripleo config seems legit' built in. 20:35:07 <lifeless> jog0: -> baby steps. 20:35:20 * jog0 *nod* 20:35:31 <pleia2> yeah, and we're starting off simple just so we have a basic setup (this whole thing has somewhat stalled partially because I've been trying to do all-the-things) 20:35:40 <lifeless> pleia2: ok, please shout in #tripleo when you need someone to eyeballs logs or whatever to help diagnose failure-to-startup. 20:35:50 <pleia2> lifeless: great, thanks :) 20:35:52 <lifeless> #topic open discussion 20:35:53 <dprince> pleia2: is this still using TOCI? 20:36:13 <pleia2> dprince: it's diverged quite a bit, but I hope to pull it back and submit some patches to TOCI 20:36:47 <dprince> pleia2: cool. I sort of went the other way... and we are close to having TOCI driving real bare metal. 20:37:05 <pleia2> dprince: it's doing a lot of the same things, so if we could have some switches built in to handle virtual+lxc it would be great 20:37:20 <pleia2> right now I'm running everything by hand though 20:37:21 <dprince> pleia2: we'll be more resource strapped there... but we are finding good things. 20:37:27 * pleia2 nods 20:37:50 <dprince> lifeless: I've got a couple things I'd like to run past you all 20:38:36 <lifeless> dprince: shoot! 20:39:30 <dprince> lifeless: Okay. First thing this troubleshooting thing. 20:40:09 <dprince> lifeless: I have a review up to make it so that we don't always have to hang the deployment process if something bad happens in the deploy ramdisk. 20:40:40 <dprince> lifeless: we *can't* hang the deploy process for CI. it will kill my resource pools and the failure rate is still way to high. 20:40:54 <dprince> So that is step one. (don't hang it) 20:41:12 <dprince> https://review.openstack.org/#/c/33076/ 20:41:49 <lifeless> hanging bad 20:41:58 <lifeless> there is a timeout mechanism for nova-bm 20:42:02 <lifeless> it's off by default... 20:42:02 <dprince> Step two would be to have a simple err message tracking capability. I understand we are working on a proper agent... but in the meantime for the "H" release we need something. So maybe something like this: https://review.openstack.org/#/c/33341/ 20:42:40 <dprince> lifeless: A timeout would be good as well. But it is hanging because we call a 'bash' shell inline. That is just plain bad IMO. 20:43:20 <dprince> lifeless: So with the second branch above ^^ we'd essentiall just add a small blip to the nova-bare-metal-deploy helper so we can get and log the message. 20:43:32 <lifeless> +500 20:43:58 <dprince> lifeless: I feel like this is a bit home brew... but I gotta say I can't do much about automating this without these things. 20:44:08 <lifeless> I am totally in favour of this sort of thing; devananda has some reasonable concerns about not changing nova baremetal, but IMO leaving it totally broken is not feasible. 20:44:19 <dprince> lifeless: lastly, we need to get devananda's Nova branch to improve the IMPI power commands in. 20:44:46 <dprince> lifeless: the Nova change is really small. and totally backwards compatible. I'll push it by the end of the day too. 20:44:48 <lifeless> I don't think a super duper agent is needed in the short term : its a nice thing to have, but not a necessary condition for any of this. 20:45:05 <dprince> lifeless: okay. I think we are on the same page. 20:45:24 <dprince> lifeless: Okay. Slightly different topic. 20:45:24 <lifeless> yes, ack on that - I'll dig up that review and see where we are at. 20:45:31 <sdake> lifeless does dib now default to using tmpfs for its building magic? 20:45:32 <dprince> backticks. Can they go away. 20:45:44 <lifeless> dprince: `` -> $() ? 20:45:47 <dprince> I'm much prefer we use the more formal $() 20:45:56 <dprince> lifeless: its a style thing... but yes. 20:46:04 <lifeless> fine by me; add it to HACKING or README or something so it's discoverable. 20:46:21 <lifeless> sdake: yes 20:46:28 <dprince> lifeless: Cool. We don't have a bash HACKING that I know of but I'll take a shot at that. 20:46:41 <sdake> lifeless I guess I'm a dummy but the command line option to run a fedora dib doesn't immediately stick out at me from the -h or readme.md 20:46:42 <lifeless> sdake: see README.md - Requirements. 20:46:56 <lifeless> sdake: disk-image-create fedora 20:47:08 <sdake> thanks I'll try that lifeless ;) 20:47:23 <sdake> sure is fast 20:47:33 <lifeless> sdake: :> 20:47:49 <SpamapS> will be nice with the official F19 and later imges too 20:48:00 <sdake> be sweet if it had an api to go with it :) 20:48:19 <sdake> ya f17 lost cause at this point ;) 20:48:27 <lifeless> sdake: actually I think its still too slow, need to add some parallel in there, as well as make it trivial to setup local pypi and openstack git mirrors; but folk like derekh (not in this channel atm) have that in-progress 20:48:32 <sdake> I plan to change all the heat instances to default to f19 when it comes out 20:49:06 <lifeless> sdake: an API would be nice; I think structurally we should layer that on top - separate concern. 20:49:14 <sdake> lifeless agree 20:51:25 <SpamapS> sounds like we're done :) 20:52:33 <lifeless> agreed 20:52:34 <lifeless> #endmeeting