20:03:53 #startmeeting tripleo 20:03:54 Meeting started Mon May 13 20:03:53 2013 UTC. The chair is devananda. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:03:55 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:03:57 The meeting name has been set to 'tripleo' 20:04:04 #topic bugs 20:04:21 i'm'a just follow the agenda lifeless posted on the wiki, and let others do a lot of the talking hopefully :) 20:04:28 #link https://wiki.openstack.org/wiki/Meetings/TripleO 20:04:41 who all is here, anyway? 20:04:51 \o 20:04:58 .o/ 20:05:45 * devananda pokes a few people in other channels 20:05:46 \o_ 20:06:25 so, bugs..... 20:06:29 #link https://bugs.launchpad.net/nova/+bugs?field.tag=baremetal 20:06:30 and 20:06:49 #link https://bugs.launchpad.net/tripleo/+bugs 20:06:53 there's lots of them! 20:07:00 a productive week. :) 20:07:13 o/ 20:07:19 echohead_: you're going to fix all the bugs, right? :) 20:07:31 sweet 20:07:53 * echohead_ grabs his flyswatter to squash all the bugs 20:08:13 anyone want to bring up specific bugs to talk about? 20:08:37 i haven't looked through them all yet, but a lot are marked high or critical, so i am imagining there are some things to discuss :) 20:08:57 seems we have some "server gets wedged" bugs that are probably the most serious 20:09:14 Heat does not deal well at all with the ERROR state or with deleted servers. 20:09:23 working on that one after we get heat into boot-stack entirely. 20:09:49 I only had one comment, which was on the text mode console kernel options one - I don't think it's going to be fixed upstream. lifeless' comments suggest that we should set it ourselves in the base element. I am going to do that unless objections arise 20:10:14 agreed about the 'server gets wedged' bug(s) being the most urgent atm. 20:10:17 #link https://bugs.launchpad.net/tripleo/+bug/1178112 20:10:18 Launchpad bug 1178112 in tripleo "baremetal kernel boot options make console inaccessible on ILO environments" [Critical,Triaged] 20:10:19 +1, though we should all go +1 the bug as well 20:11:16 Ng: ++ to fixing it 20:12:03 Ng: in testing taht bug, what are you using to get a console? 20:12:10 Ng: nova commands or something else? 20:12:34 SpamapS: any particular "server gets wedged" bugs you want to point out? 20:13:14 devananda: no, I'm mostly referencing things overheard 20:13:30 devananda: so actually the easiest way to test it I've found so far, is to run an image with kvm -curses. Can't show the console in a terminal if it's graphical, however, I will test in a nova to make sure it doesn't interfere with the console log getting, but I don't think it should do, that will still be specified as the last console= on the commandline and so win upstart's heart 20:14:06 devananda: in the test rack, ~20% of nova boots fail. i'm not sure that it has been narrowed down enough yet to produce a valuable bug report. 20:14:23 not clear yet if it is isolated to specific machines, or what. 20:14:26 Ng: gotcha. in theory, baremetal driver has some support for textconsole... i'd be interested to know if that actually works :) 20:14:27 To help split up the work, can I suggest folks who are knowledgeable about the bug and/or nova make sure there is enough information in each bug to make them actionable? 20:14:40 cody-somerville: ++ 20:15:37 also, if you're going to work on a bug, please assign it to yourself (shoudl go without saying... :) ) 20:15:39 devananda: huh, ok, I'll have a look at that too 20:15:53 don't know if this is a good idea or not, but I am working on https://bugs.launchpad.net/nova/+bug/1178378 and don't have enough experience to really get my head around it, if someone was willing to pair with me to get it done, I think I would learn a lot from that 20:15:54 Launchpad bug 1178378 in tripleo "confused baremetal instance thinks its off, is clearly operational" [Critical,Triaged] 20:16:00 echohead_: I have heard similar numbers from others using similar hardware. 20:16:12 interesting 20:16:19 failing that I will keep plugging away 20:16:19 echohead_: my inclination is to guess that it's 20% of machines are bad -- or just have the wrong info 20:16:56 i think we should have an action item to see if that is the case, and if so, to un-enroll the problematic machines. 20:17:17 #action echohead_ to determine if test rack failures are machine specific 20:18:03 anteaya: I volunteer NobodyCam to help with that :) (assuming he doesn't mind) 20:18:14 yay NobodyCam 20:18:21 does that work for you? 20:18:42 * NobodyCam looks at bug 20:18:49 it would cut down on my aimless flailing 20:18:55 Is the 30% fail to boot thing different from lp #1178586? 20:18:56 Launchpad bug 1178586 in nova "scheduling failures leave baremetal instances stuck in BUILDING" [Medium,Triaged] https://launchpad.net/bugs/1178586 20:18:58 *20% 20:19:33 NobodyCam: I planted a flag here: https://review.openstack.org/#/c/28817/ 20:19:41 not much to it, but a beginning 20:19:41 cody-somerville: i think that's a case where the bug doesn't have enough info for me to know what it is 20:19:50 lp #1178919 is also related if not the same bug 20:19:51 Launchpad bug 1178919 in tripleo "instances get stuck in 'BUILDING' sometimes" [High,Triaged] https://launchpad.net/bugs/1178919 20:20:02 cody-somerville: the failures look a bit different, in that they are stuck in 'spawning', as opposed to 'scheduling', as in the bug. 20:20:45 echohead_: like "| 0a171cbe-0f3c-40d5-ae8d-606f1dde41ce | test-0a171cbe-0f3c-40d5-ae8d-606f1dde41ce | BUILD | spawning | NOSTATE | | "? 20:21:05 yep 20:21:36 yea, 1178919 and 1178586 appear to be the same 20:22:13 marked as sup 20:22:14 dup 20:22:17 devananda: Just noticed that one says it's stuck in spawning and the other in scheduling. 20:22:18 devananda: anteaya and I will look into whats up 20:22:33 thanks NobodyCam 20:23:22 cody-somerville: gah, thanks 20:23:25 devananda: may be the same problem (booting too many things at once maybe?) or slightly different - I see lifeless speculates that the one stuck in scheduling might not be bm related. 20:23:43 that isn't a stuck-in-scheduling issue 20:23:55 the scheduler gave up after trying 3 times unsuccessfully 20:24:09 each attempt tried to perform a deploy 20:24:10 failed 20:24:16 and was deleted by the scheduler 20:24:31 What about 'stuck in deleting' ? 20:24:50 I have that right now. :-P 20:25:00 hah 20:25:12 SpamapS: lifeless had success with cleanup stuck 'deleting' instances manually. 20:25:19 ugh 20:26:31 ok, http://paste.ubuntu.com/5650528/ shows the compute log for the scheduling failure 20:26:54 * SpamapS unfortunately has conflicting things now and so will just be lurking 20:28:09 it is different indeed. the spawn failure doesn't look like it ever powered on. the schedule failure powered on then failed in deploy 20:28:50 huh? 20:28:59 there goes the bot I do believe 20:29:02 i was about to say, enough with bugs let's move on 20:29:05 grrr 20:29:12 #topic test rack 20:29:21 yay! bot's still alive 20:29:25 bot is good, just chanserv rejoined 20:29:31 cool 20:29:33 or came up 20:29:35 good bot 20:29:49 test rack is coming along nicely, i expect heat to be running there by eod, hopefully. 20:30:09 hopefully the nova-boot failures are specific to certain boxes, and can be removed. 20:32:05 echohead_: you needed me to lok at some heat / t-i-e reviews, yes? 20:32:35 devananda: yes, that would be good. i will be applying those pending changes on the test rack to bring up heat. 20:32:53 #action devananda to review t-i-e heat changes 20:32:55 k, will do 20:33:08 anything else to discuss on the rack? 20:33:09 seems like getting the openstack service heat templates working is the next thing in the critical path, which can proceed in parallel in a virtual-bm environment. 20:33:51 also, we will need a first-boot script to write the /etc/network/interfaces appropriately for openstack nodes on the machines. 20:34:11 echohead_: ah, is that related to the baremetal-always-does-file-injection bug? 20:34:31 or put another way, if bm didn't do file injection, would you still need to write that script? 20:34:45 devananda: i think we would still need it even without the bug. 20:34:52 because we must configure vlan interfaces, etc. 20:35:03 i see 20:35:48 then i won't prioritize fixing _that_ bug 20:35:53 :) 20:35:56 #topic open discussion 20:36:24 fyi wrapping up the quantum PXE changes 20:36:32 awesome 20:36:38 for review beod 20:36:50 and the quantum client 20:37:05 will go next to the nova changes to talk to it 20:37:09 please drop links in #tripleo for that when it's up 20:37:17 yes sir 20:37:37 i can help with the nova changes 20:37:41 once i see what it has to talk to 20:37:44 will be going silent for a few days at the end of the month 20:38:01 transistion to Europe 20:38:19 sounds fun! 20:38:27 fingers crossed 20:38:41 dkehn: safe travels 20:38:45 tx 20:38:47 thx 20:39:14 i've been mostly occupied with lots and lots of hacking on Ironic 20:39:21 for the last week and all weekend 20:39:28 and will probably continue to be so consumed :) 20:39:34 gotta love the name, quantuim is still battling with legal 20:39:38 over a name 20:39:40 yea :( 20:39:52 i'm actually still waiting on foundation to sign off on "Ironic" 20:40:01 but my own research says it's not TMd in the US 20:40:14 Alanis Morrisette begs to differ. 20:40:19 oh wait, that's canada. 20:40:45 if folks need me for things, don't hesitate to poke (not that anyone has hesitated anyway, just sayin) 20:40:46 do not invoke that name 20:41:01 echohead_: i think the song title is "isn't it ironic" :) 20:41:06 no no no 20:41:10 it shall not be said 20:41:13 :P 20:41:15 devananda: ok, guess we're clear then :) 20:42:03 any other topics? 20:42:11 or shall we wrap up early again? 20:42:29 :) 20:42:37 when I logged into IRC tripleo is said meeting at 2000 UTC, which time is ti 20:42:57 nevermind 20:43:11 dkehn: :) 20:43:28 well, thanks all! 20:43:32 devananda, so that is the real time? 20:43:40 great chairing devananda 20:43:42 #endmeeting