19:00:30 <devananda> #startmeeting ironic 19:00:31 <openstack> Meeting started Mon Dec 2 19:00:30 2013 UTC and is due to finish in 60 minutes. The chair is devananda. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:35 <openstack> The meeting name has been set to 'ironic' 19:00:37 <devananda> #chair NobodyCam 19:00:38 <openstack> Current chairs: NobodyCam devananda 19:00:45 <NobodyCam> :) 19:00:52 <devananda> #topic greetings & roll call 19:00:55 <devananda> hi all! who's here? 19:00:57 <GheRivero> o/ 19:00:58 <lucasagomes> o/ 19:01:03 <agordeev> o/ 19:01:04 * NobodyCam o/ 19:01:08 <rloo> o/ 19:01:11 <yuriyz> o/ 19:01:29 <devananda> great! 19:01:33 <devananda> for reference, here's the agenda 19:01:34 <devananda> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting 19:02:02 <devananda> #topic announcements / updates 19:02:13 <devananda> just one today 19:02:42 <devananda> i started writing a consistent hash ring for the conductor service 19:03:03 <devananda> to solve several problems that we ran into over the last two weeks 19:03:10 <lucasagomes> nice, any idea when ur going to put a review up? 19:03:29 <devananda> around routing RPC messages, knowing which conductor is responsible for which instance, and handling failures 19:03:32 <devananda> yes 19:03:34 <devananda> one already landed 19:03:40 <lucasagomes> and btw, does we need it in order to get out of the incubation process? 19:04:24 <NobodyCam> lucasagomes: I think we do. our we'll have to do a lot of reengerring 19:04:35 * devananda looks for the commit 19:04:35 <lucasagomes> NobodyCam, right 19:04:37 <NobodyCam> our = or 19:04:54 <lucasagomes> right yea def it's important 19:05:03 <devananda> #link https://review.openstack.org/#/c/58607/ 19:05:14 <lucasagomes> just asking cause in our last conversation it was unclear whether we would need it or not 19:05:17 <devananda> yea 19:05:40 <NobodyCam> #link https://blueprints.launchpad.net/openstack/?searchtext=instance-mapping-by-consistent-hash <- fyi 19:05:43 <devananda> so i did the mental excersize of: what happens if we dont have this? --> we can only run one conductor instance --> can we exit incubation like that? --> probably not. 19:06:05 <devananda> and decided to dig in and get'er'done 19:06:19 <lucasagomes> great! 19:06:32 <devananda> also talked with the nova folks briefly about that, and got the same impression from them 19:07:05 <vkozhukalov> o/ 19:07:24 <lucasagomes> I know tripleo might not need it in the moment, but yea having it done is def a big/great step 19:08:21 <devananda> the lack of this in nova-bm can be worked around with eg. pacemaker + drbd 19:09:17 <devananda> in theory that could also provide some HA for ironic. but we would still need to restrict it to a single conductor instance (until we add this hash ring) 19:09:26 <devananda> so. bleh. more to do. 19:09:46 <devananda> ok, moving on (we can come back to this in open discussion) 19:09:55 <devananda> #topic action items from last week 19:10:06 <devananda> #link http://eavesdrop.openstack.org/meetings/ironic/2013/ironic.2013-11-25-19.01.html 19:10:20 <devananda> i think 2 of those were romcheg's, and he's not here today 19:10:30 <lucasagomes> I ported the last bug on the list at whiteboard today 19:10:42 <lucasagomes> gotta find some more things that might need to be ported from nova bm havanna to ironic 19:10:54 <devananda> dkehn and NobodyCam and I talked about the nova-network api bits, and I think it's clear. dkehn, any updates? 19:10:56 <lucasagomes> #link https://review.openstack.org/#/c/59493/ 19:11:04 <devananda> lucasagomes: awesome, thanks! 19:11:12 <dkehn> nothing that I've heard 19:11:14 <NobodyCam> lucasagomes: great! 19:11:45 <dkehn> I think everyone is watching the neutron stabilization progress 19:12:09 <devananda> dkehn: when do you think you'll have some code for the pxe driver -> neutron integration? 19:12:37 <dkehn> working it presently, ran into issues with bring up the dev env 19:12:47 <dkehn> working with NobodyCam to resolver 19:12:53 <devananda> ack 19:13:21 <dkehn> but the 1st stage is just the PXE data then will work the rest assuming no issue with env 19:14:24 <NobodyCam> dkehn: I will be afk a good chuck ot today hit me on gtalk if you have questions 19:14:35 <devananda> as far as oslo/sqla-migrate vs. alembic, folks seem to still like alembic in principle, but i don't have any concrete "yes we should move" answer yet 19:14:36 <dkehn> k 19:15:20 <devananda> #topic integration and testing 19:15:53 <devananda> romcheg isn't here today, and he's been doing most of the work in this area 19:16:19 <devananda> yuriyz: don't suppose you have any updates from him? 19:16:26 <NobodyCam> I have been working on the nova integration 19:16:42 <NobodyCam> we are making progress. 19:16:51 <yuriyz> no from romcheg 19:17:02 <vkozhukalov> we are going to make prove of concept scheme for integration testing 19:17:06 <NobodyCam> Exposing node deploy() will be a biggie 19:17:33 <devananda> vkozhukalov: are you working with -infra on that? 19:17:59 <vkozhukalov> something like launching one VM, installing ironic on it, then launching another VM and booting it from the first one 19:18:23 <vkozhukalov> devananda: no, we just started to do that 19:18:44 <devananda> vkozhukalov: there's a lot of work that has / is being done around testing tripleo in -infra, which means using nova-baremetal. much of that work can probably be used in the same way for testing ironic 19:19:16 <devananda> vkozhukalov: will you be around after this meeting? we should chat with infra team :) 19:19:31 <vkozhukalov> devananda: ok, we can 19:19:36 <devananda> vkozhukalov: great, thanks! 19:19:43 <devananda> #topic nova driver 19:19:53 <NobodyCam> oh thats me 19:19:55 <devananda> (skipping the client because NobodyCam has to leave soon -- will come back to it) 19:20:03 <devananda> NobodyCam: hi! how goes it? 19:20:12 <NobodyCam> we are making progress. :-p 19:20:22 <NobodyCam> can we jump to api 19:20:29 <devananda> oh. sure 19:20:39 <devananda> #topic python-ironicclient & API service 19:20:43 <devananda> lucasagomes: that's you! 19:20:49 <NobodyCam> lucasagomes: you have thoughts on deploy? 19:20:49 <lucasagomes> vkozhukalov, might worth to take a look at https://github.com/openstack-infra/tripleo-ci 19:20:56 <lucasagomes> oh right, so as NobodyCam mentioned 19:21:11 <lucasagomes> we need to expose a way to trigger the node deploy from the API/client libraries 19:21:16 <lucasagomes> I thought about something like 19:21:48 <lucasagomes> POST /nodes/<uuid>/deploy returning 202 with the location header field pointing to /nodes/<uuid>/state case the request gets accepted 19:22:00 <lucasagomes> 403 in case the deployment was already trigged and not completed 19:22:06 <lucasagomes> also we need a way to abort 19:22:19 <lucasagomes> so folk could do a DELETE /nodes/<uuid>/deploy 19:22:26 <NobodyCam> lucasagomes: will deploy be sync or async? 19:22:28 <lucasagomes> to abort the operation 19:22:31 <lucasagomes> async 19:22:36 <lucasagomes> that's why 202 + location 19:22:55 <lucasagomes> location = he can look at the state resource to see in which state the node currently is 19:23:01 <devananda> lucasagomes: what will be in the POST body? 19:23:02 <lucasagomes> + the target state 19:23:26 <lucasagomes> devananda, didn't think about it yet 19:23:35 <lucasagomes> just got me think about how it would work at the end of today 19:23:42 <devananda> NobodyCam: you'll need to have a while loop in the nova driver, polling nodes/<uuid>/state, to see when it reaches "done", or if it errors, and also tracking some timeout in Nova 19:23:42 <lucasagomes> so there's some gaps, just the initial idea 19:23:44 <NobodyCam> so nova driver will have to poll 19:23:50 <NobodyCam> yep 19:24:35 <NobodyCam> if nova times out (ie.. very long deploy) we will be able to roll back / delete 19:24:45 <rloo> does the nova driver use the ironicclient, or issue a POST directly? 19:24:46 <devananda> NobodyCam: eg, https://github.com/openstack/nova/blob/master/nova/virt/baremetal/pxe.py#L455 19:24:53 <devananda> rloo: ironicclient 19:24:58 <lucasagomes> rloo, it uses the ironic client libs 19:24:59 <NobodyCam> currently ironic client 19:25:04 <NobodyCam> :) 19:25:04 <rloo> hmm, what about an eg --poll option, like nova boot has? 19:25:21 <devananda> rloo: that's a CLI thing 19:25:40 <devananda> rloo: CLI and API are distinct, even though they're packaged together 19:26:01 <NobodyCam> ya that loop really need to be in nova 19:26:05 <rloo> oops. 19:26:08 <NobodyCam> *nova driver 19:26:20 <devananda> nova driver wraps the client API. the CLI also wraps the client API,.... BUT the CLI shouldn't include any "deploy" method 19:26:22 <romcheg_> Is Ironic meeting still here? 19:26:27 <devananda> romcheg_: hi! yes 19:26:34 <NobodyCam> yea 19:26:39 <NobodyCam> hi romcheg_ :) 19:26:46 <lucasagomes> yea just the lib will contain the method to trigger the deployment 19:26:57 <lucasagomes> cli won't expose it 19:26:58 <romcheg_> Hi, sorry for being late. Street meeting took more time :) 19:27:00 <devananda> romcheg_: we can come back to your things in a few minutes 19:27:24 <devananda> lucasagomes: I think a POST with 202 is fine in principle 19:27:42 <NobodyCam> I'm good with polling 19:27:44 <lucasagomes> right, it's also good to say that 19:28:01 <lucasagomes> WSME right now doesnt support returning a Location in the HTTP header 19:28:03 <NobodyCam> so lone as we can "break/stop" the deploy 19:28:10 <lucasagomes> #link https://bugs.launchpad.net/wsme/+bug/1233687 19:28:12 <uvirtbot> Launchpad bug 1233687 in wsme "Return Location with POST 201 return code" [Wishlist,New] 19:28:24 <lucasagomes> same prob for 202 ^ 19:28:25 <devananda> hah 19:28:27 <devananda> ok 19:28:41 <lucasagomes> so people go there and clicks in the affect me button :P 19:28:44 <devananda> so we can just do that in the nova driver anyway 19:28:58 * devananda clicks "affects me" 19:29:01 <NobodyCam> lucasagomes: I could build the link 19:29:08 <NobodyCam> ya 19:29:19 <lucasagomes> NobodyCam, yes you can build it, np 19:29:40 <devananda> i dont think you need to build it, really 19:29:49 <lucasagomes> yea build it = call a method in the lib 19:29:51 * NobodyCam is running short on time 19:29:51 <devananda> the client lib already has a node.state object, ya? 19:29:53 <devananda> right 19:30:24 <devananda> NobodyCam: go if/when you need to. i can fill you in later 19:30:46 <NobodyCam> devananda: TY ... sorry for running out 1/2 way thru.. 19:31:09 <lucasagomes> NobodyCam_afk, see ya later 19:31:26 <devananda> lucasagomes: have more to discuss on the API / client libs? 19:31:34 <lucasagomes> devananda, not from me 19:31:41 <lucasagomes> if there's no objections I will start working on that tomorrow 19:31:48 <devananda> lucasagomes: ++ 19:31:52 <lucasagomes> so I NobodyCam_afk can start using it asap 19:31:58 <devananda> anyone else, questions on API / client? 19:32:12 <rloo> what about a way to interrupt it? 19:32:30 <lucasagomes> rloo, it will use DELETE to abort operation 19:32:41 <lucasagomes> so the same way you POST to that resource to trigger the deploy 19:32:47 <lucasagomes> you can DELETE to abort 19:32:55 <rloo> Ok. (I have to admit, i don't know what already exists.) 19:32:58 <devananda> as far as an API goes, I think that's reasonable 19:33:18 <devananda> i'm not sure how easily we can get the plubming to actually interrupt an in-progress deploy 19:33:38 <lucasagomes> yea, that will be another challenge :) 19:33:50 <lucasagomes> probably solved by the way the ramdisk will do things 19:33:53 <devananda> and we probably shouldn't implement DELETE /nodes/<uuid>/deploy until we can actually satisfy that request 19:33:57 <lucasagomes> like asking for the next steps 19:34:10 <devananda> perhaps 19:34:16 <devananda> my concern is more around the node locking 19:34:57 <lucasagomes> like aborting not release the node? 19:35:13 <devananda> whether DELETE // interrupt is async or sync, we'll still have the problem that the node resource is locked by the greenthread which is doing the deploy 19:35:42 <devananda> we can't just go update the DB record while that's going on and expect deploy() to behave reasonably 19:36:15 <lucasagomes> oh yea, aboarting will need efforts in a couple of areas 19:36:24 <lucasagomes> most important maybe is the ramdisk 19:36:30 <lucasagomes> how it will know it has to abort etc 19:36:39 <devananda> right. I think DELETE an in-progress deploy should wait until we can look more into those areas, and it's not needed for coming out of incubation 19:37:00 <lucasagomes> cool, so do not expose DELETE for now? 19:37:13 <lucasagomes> or expose it and raise an NotImplemented error? 19:37:28 <devananda> I would not expose it yet 19:37:41 <lucasagomes> right 19:37:47 <romcheg_> +1 for not exposing DELETE 19:37:55 <devananda> NotImplemented vs NotFound. I prefer the latter 19:38:00 <rloo> I was going to suggest exposing/raising error, and adding comment why. 19:38:45 <rloo> or have some place so someone knows why DELETE doesn't exist yet. 19:39:03 <lucasagomes> rloo, I think the main thing is that we don't need it for coming out of incubation so, we can add it after (and also docs about it) 19:39:09 <devananda> we can certainly add inline documentation in the API code about it 19:39:16 <devananda> and it may be worth adding a BP to track the intent for it 19:39:17 <lucasagomes> yea like a TODO there 19:39:29 <rloo> yeah, i understand. but it is hard to know, during all this progress, what is avail, not avail, and why. 19:39:31 <lucasagomes> devananda, if u want to give me one action to write a bp 19:39:49 <devananda> #action lucasagomes to file a BP for DELETE /nodes/<uuid>/deploy 19:40:03 <lucasagomes> rloo, do you think a TODO in the code explanation our intentions and why it's not implemented in the moment would be enough? 19:40:16 <rloo> yup. enough for me anyway :-) thx. 19:40:25 <lucasagomes> ok will do that :) 19:41:19 <devananda> ok, moving on 19:41:23 <devananda> romcheg_: still around? 19:41:48 <romcheg_> devananda: yup 19:42:20 <devananda> #topic integration and testing 19:42:22 <romcheg_> Actually I do not have a lot of updates. 19:42:52 <devananda> romcheg_: give us what you've got :) 19:43:16 <romcheg_> I rebased my patch to infra-config to clarkb's and waiting until that refactoring is finished 19:43:54 <clarkb> romcheg_: my change just got approved, it needs a little babysitting, but once we are happy with it, your change will be reviewable 19:43:55 <romcheg_> The tempest patch does not attract a lot of people unfortunatelly 19:44:20 <romcheg_> clarkb: Cool. Will take a look at that in the morning 19:44:28 <devananda> romcheg_: if i understand correctly, after your infra/config patch lands, we should have some devstack tests in ironic's pipeline, yes? 19:45:31 <romcheg_> devananda: as we discussed previously, we will add tempest tests for Ironic for gate and check pipelines to Ironic and to the experimental pipeline for tempest 19:45:48 <devananda> romcheg_: right 19:46:32 <devananda> romcheg_: i'm wondering if there are any other dependencies, besides https://review.openstack.org/#/c/53917, to get it working in the ironic pipeline 19:47:25 <romcheg_> devananda: No, only this configuration change and the tests 19:47:39 <devananda> great 19:48:09 <devananda> #topic open discussion 19:48:15 <lucasagomes> aight 19:48:24 <devananda> look! a whole 12 mintues for open discussion today :) 19:48:29 <lucasagomes> devananda, is it part of Ironic plan's to get metrics from other devices (e.g storage arrays) just like we will be getting metrics for servers (via IPMI) ? 19:48:36 <romcheg_> I continuously check the tests against the latest Ironic to detect any changes that broke Ironic 19:48:52 <devananda> lucasagomes: only devices which ironic is managing/deploying to 19:49:03 <lucasagomes> right 19:49:16 <romcheg_> Hopefully everything works now so as soon as those two patches landed, we will have tempest tests for Ironic 19:49:28 <devananda> lucasagomes: there was some interest in having ironic able to deploy firmware // small OS images to network switches (eg, open compute / ODCA stuff) 19:49:46 <devananda> lucasagomes: which I generally dont think we're ready for. but that kinda touches on the same space as your question 19:49:47 <lucasagomes> so we plan to do things on switches for e.g that would be one of the cases of devices we would be able to control using ironic? 19:49:49 <lucasagomes> right 19:50:18 <lucasagomes> that makes sense to me 19:50:45 <devananda> lucasagomes: the whole "configure the hardware" bit gets very wierd if ironic starts configuring switches and SAN 19:51:04 <devananda> lucasagomes: i really dont think it shoould do that. we have other services for that 19:51:45 <lucasagomes> right yea we should not start squashing a lot of things into ironic for sure 19:51:49 <lucasagomes> focused tools 19:52:09 <devananda> lucasagomes: right. OTOH, if someone wants to install a new OS on their switch, well, _that_ is Ironic's domain 19:52:28 <devananda> but it exposes some really wierd questions 19:53:03 <devananda> talking to Nova's API to have Ironic deploy an image from Glance onto their hardware switch, then using Neutron to configure that switch 19:53:15 <lucasagomes> yea, it's not something for icehouse for sure, but in the future we might start need to discuss things like it 19:53:17 <devananda> but we needed some networking in order to do the deploy in the first place .... 19:53:37 <devananda> definitely worth longer discussions 19:54:20 <lucasagomes> devananda, another question re consistent hashing... I saw ur implementing the hashring class, we r not going to use any lib that does it already? any reasons for that, lack of py3 support? 19:54:44 <lucasagomes> devananda, +1 for discussions 19:55:30 <devananda> lucasagomes: i found 2 py libs out there, one of which was unmaintained, and none within openstack yet 19:55:40 <devananda> i looked at swift's hash ring 19:55:55 <devananda> and talked with notmyname to get some ideas, but that code is too tightly coupled to swift's particular needs 19:56:17 <devananda> so this is coupled to our needs, and the ring code itself is pretty small 19:56:30 <lucasagomes> right 19:56:34 <devananda> the complexity is going to be in the routing and rebalancing code that I'm working on now 19:56:54 <lucasagomes> that has to do with the list of dead conductors? 19:56:57 <devananda> yes 19:57:05 <lucasagomes> right yea I understand 19:57:18 <lucasagomes> good stuff :) 19:57:21 <devananda> eg, a conductor misses a few heartbeats -- don't take it out and rebalance everything. just skip it and talk to the next replica 19:57:58 <devananda> i think we only need to do a full rebalance in two cases: new conductor joins the ring; admin removes a conductor from the ring 19:57:59 <lucasagomes> anything about the number of replicas? 19:58:08 <lucasagomes> I saw that they tend to use like loads of replicas to make it faster 19:58:31 <devananda> lucasagomes: in swift, sure. in ironic, more replicas won't make deploys faster or anything 19:58:35 <lucasagomes> devananda, ahh thats interesting yea, if someone joins 19:58:36 <devananda> just means more resilience to temporary failures 19:58:53 <lucasagomes> we would need to rebalance and set the nodes to be controller by specific conductors 19:59:22 <devananda> rebalance will redistribute the nodes across conductors (with the appropriate drivers) 19:59:39 <devananda> it's not a manual admin-has-to-move-nodes thing 20:00:08 <devananda> and the conductor<->node relationship isn't stored in the DB (apart from TaskManager locks) 20:00:38 <devananda> i think we'll need some good docs for the hash ring stuff 20:00:39 <lucasagomes> oh yea otherwise it would be more complicated even to do a take over 20:00:47 <devananda> so i'm going to work on diagrams today to explain them 20:00:53 <lucasagomes> cool 20:00:57 <lucasagomes> looking fwd to see some patches coming 20:01:18 <lucasagomes> even more complicated* 20:01:19 <devananda> i'll un-draft the patch i have once i've cleaned it up a bit 20:01:45 <lucasagomes> great :) 20:02:19 <devananda> anything else? we're a tad over time now 20:02:49 <devananda> ok, thanks all! 20:02:58 <devananda> #endmeeting