19:03:18 <devananda> #startmeeting ironic 19:03:19 <openstack> Meeting started Mon May 27 19:03:18 2013 UTC. The chair is devananda. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:22 <openstack> The meeting name has been set to 'ironic' 19:03:33 <devananda> #link https://wiki.openstack.org/wiki/Meetings/Ironic 19:03:41 <devananda> agenda is pretty much the same -- 19:03:48 <devananda> #topic API specification 19:04:01 <devananda> martyntaylor and I have been going back and forth for a while 19:04:06 <devananda> but i think it's in a pretty good state now 19:04:12 <GheRivero> o/ 19:04:17 <lifeless> cool 19:04:23 <devananda> \o GheRivero ! thanks for joining :) 19:04:31 <devananda> #link https://etherpad.openstack.org/ironic_api 19:04:35 <devananda> that's the work-in-progress 19:05:10 <devananda> so 19:05:23 <devananda> anyone have qustions/thoughts related to the API they want to raise here? 19:06:25 <ohadlevy> devananda: how come in the api a node has only one disk? 19:06:47 <devananda> ohadlevy: that represents the local storage capacity, not # of disks 19:07:01 <ohadlevy> devananda: yes i understand, but it represent only one disk? 19:07:13 <devananda> ohadlevy: as far as how to partition that storage, we'll piggyback on Cinder's volume API for that --later-- 19:07:17 <ohadlevy> devananda: shouldnt disk be an array? 19:07:18 <ohadlevy> ok 19:07:37 <devananda> yea. this is a more dumb representation of "what resources to expose to Nova" 19:07:41 <devananda> than what is physically there 19:07:47 <martyntaylor> ohadlevy: the disk field essentially means no GB storage available, and is borrowed from Nova 19:08:16 <lifeless> so the size exported to nova 19:08:31 <lifeless> should be the size of sda; if the raid array is jbod mode, thats a single disk 19:08:34 <ohadlevy> martyntaylor: i get that, just assume you might need to be able to schedule based on that at some later point, but thanks :) 19:08:42 <lifeless> if it's in raid6 or whatever, its the total usable storage. 19:08:48 <lifeless> ohadlevy: that what cinder is for 19:09:01 <lifeless> ohadlevy: and it's planned ;) 19:09:29 <martyntaylor> ohadlevy: no worries, I was working largely from devananda originally resource representations, if you guys think anything is missing or inappropriate, happy to change/add stuff 19:09:40 <ohadlevy> what about timestamps of last deployed etc? or all of that is in nova? 19:09:50 <devananda> fwiw, i'm trying to keep feature-creep out. there's a lot we need eventually, but also a lot we _dont_ need to get something stood up and working initially 19:10:04 <ohadlevy> last time called home etc 19:10:25 <devananda> ohadlevy: well, the nodes dont "call home" since there's no agent, but your question is valid 19:10:47 <devananda> the internal db model does track some of that 19:11:05 <ohadlevy> also.. what about hw model etc, shouldnt we expose some inventory value here? 19:11:40 <lifeless> ohadlevy: why? 19:11:55 <lifeless> ohadlevy: remember that the scheduler is in nova (for now, maybe another project in future) 19:12:31 <devananda> ohadlevy: ironic is not a CMDB :) 19:12:40 <martyntaylor> fwiw nodes and chassis do have meta-data resources that can contain arbitrary key/vals 19:12:41 <ohadlevy> lifeless: well, i assume you need to map model <-> driver? 19:12:45 <lifeless> ohadlevy: nope 19:13:08 <devananda> ohadlevy: definitey no such mapping 19:13:12 <lifeless> ohadlevy: there are two ways to tackle hardware specific driver issues; the first - and best - is to bake all the drivers for your environment into your images. 19:13:26 <lifeless> ohadlevy: -> it's an OS problem. And every OS around today can do that. 19:13:42 <lifeless> ohadlevy: the second, if you have a broken driver that will lockup machines if it's installed and the hardware is missing... 19:13:52 <ohadlevy> lifeless: why os problem? if its a hp ilo, you might want to use a different driver later on etc? thats before the os? 19:13:53 <lifeless> ohadlevy: is to export those nodes as a different flavor 19:13:58 <devananda> ohadlevy: if you refer to the control / deploy drivers, that is configurable, not automatic. 19:14:21 <devananda> ohadlevy: did you mean the ironic driver, not the in-OS device drivers? 19:14:32 <ohadlevy> dendrobates: yes 19:14:36 <ohadlevy> oops 19:14:41 <ohadlevy> devananda: ^^ 19:14:42 <lifeless> ohadlevy: lastly, as devananda says - every chassis has a specific driver, and it's up to the enrollment process [which is currently out of band] to device what driver for what chassis. 19:14:51 <devananda> precisely ^ 19:14:59 <lifeless> ohadlevy: the enrollment process might want to consult a lshw 19:15:12 <lifeless> ohadlevy: and when we fold it into the core, I'm sure it will be configurable :) 19:15:21 <ohadlevy> ok.. I just think that when you list the systems, it helps to see which hw model they are on :) 19:15:31 <devananda> the different drivers may have very different requirements, so changing a node from driver X to driver Y is not necessarily just "change one field" 19:15:51 <ohadlevy> but i understand :) 19:16:09 <lifeless> ohadlevy: there's a bunch of stuff folk might be interested in; do we need to model it? Or just provide a 'description' field that a CMDB or sysadmin can populate? 19:16:14 <devananda> ohadlevy: as martyntaylor pointed out, there is a metadata field which the deployer could populate with $stuff. if that is useful to them, fine. it's not (prsently) useful to Nova 19:16:20 <lifeless> ^ tada 19:16:21 <uvirtbot> lifeless: Error: "tada" is not a valid command. 19:16:22 <devananda> :) 19:16:29 <lifeless> uvirtbot: oh hush. 19:16:31 <uvirtbot> lifeless: Error: "oh" is not a valid command. 19:16:33 <devananda> haha 19:16:39 <lifeless> uvirtbot: srsly? 19:16:39 <lifeless> :P 19:16:41 <uvirtbot> lifeless: Error: "srsly?" is not a valid command. 19:16:50 * lifeless steps away from the robot 19:17:01 <devananda> i'm going to move on 19:17:11 <ohadlevy> +1 19:17:19 <devananda> if there are more discussions folks want to have around the API, please poke on the dev list :) 19:17:33 <devananda> otherwise, i'll give martyntaylor the go-ahead with this spec in the next few days 19:17:38 <devananda> #topic blueprints 19:17:52 <devananda> #link https://blueprints.launchpad.net/ironic 19:18:04 <devananda> i put a bunch up last week, most got assigned 19:18:12 <devananda> some are pretty broad and could be subdivided 19:18:24 <devananda> so if you're looking to get involved and dont see something available, see ... 19:18:33 <devananda> #link https://etherpad.openstack.org/IronicWhiteBoard 19:18:46 <devananda> for a broader discussion of what's done / in progress / can be divvied up 19:19:16 <devananda> i've got a framework for the manager service, RPC layer, drivers, and API done and in place 19:19:37 <devananda> so i believe there's enough "meat" for folks to start taking smaller chunks and doing them in a day or two, then posting reviews back 19:19:48 <devananda> eg, i tossed this up today for the PXE driver port: https://etherpad.openstack.org/ironic_pxe 19:19:49 * anteaya applauds devananda 19:20:15 <devananda> any specific questions / concerns / "hey can I do X" ? :) 19:20:58 <devananda> also, if you have a BP assigned to you, and want to talk about it, now's a great time 19:21:38 <GheRivero> pxe driver is on the way...tomorrow should be 19:21:48 <GheRivero> most of the pieces 19:22:40 <devananda> GheRivero: great! 19:22:41 <GheRivero> image_download is going to take a bit more 19:23:24 <devananda> GheRivero: i'd like to encourage you to post it to gerrit for early review , even if it's just a draft or WIP 19:23:31 <devananda> same goes for everyone 19:23:38 <GheRivero> yeah, sure 19:24:27 <devananda> if no one else has anything about BP's, i'll just open the floor 19:24:32 <devananda> #topic open discussion 19:25:44 <lifeless> we could talk deploy mechanisms :) 19:25:49 <devananda> i should also mention, we'll hav auto docs soon 19:25:59 <devananda> lifeless: hah. sure, that too :) 19:26:46 <devananda> once the #-infra team fixes PBR, I will land this: https://review.openstack.org/#/c/30523/ 19:26:50 <martyntaylor> gentemen unfortunately I am going ot have to drop off for a while, I leave open the chat and catch up when I return 19:27:06 <devananda> martyntaylor: np. thanks! 19:27:52 <devananda> lifeless: in absence of other topics, you're up :) 19:27:55 <lifeless> so 19:28:01 <lifeless> there is a meme 19:28:06 <lifeless> about 'glanceclient in ramdisk' 19:28:33 <lifeless> I find that idea worrying; other people don't. Why don't other people? Or what is it that makes it appealing? 19:29:12 <dkehn> I'm assuming the appeal would be performance related 19:29:16 <devananda> one of the appealing aspects, IMHO, is the "scaling glance is already understood, so let's leverage that to scale image distribution to bare metal" 19:29:18 <lifeless> I put my concerns in prose - on https://blueprints.launchpad.net/ironic/+spec/deploy-agent 19:29:57 <lifeless> devananda: so, scaling glance isn't well understood 19:30:01 <dkehn> but resilency would worry me 19:30:12 <lifeless> devananda: glance depends heavily on compute node local caches. 19:30:55 <devananda> lifeless: aiui, glance can be easily backed by something scalable, eg swift/ceph/$foo. 19:31:31 <lifeless> devananda: with appropriate credentials on each machine accessing it yes - and the ceph backend stuff for that is cool 19:31:44 <lifeless> devananda: but, thats not of benefit to glanceclient-on-transient-machines 19:32:16 <devananda> if we assume image $foo will be in a file system cache $somewhere on the network 19:32:28 <devananda> (whether ceph or ironic-manager / nova-compute, doesn't matter) 19:32:48 <lifeless> compressed or uncompressed ? 19:33:15 <devananda> then the number of times and network distance that said image must traverse before it is written to disk to deploy $alot of nodes is the question 19:33:22 <devananda> and whether it's compressed / where it's uncompressed :) 19:33:39 <lifeless> also the amount of latency that is incurred 19:33:41 <lifeless> python is slow 19:33:46 <devananda> sure 19:33:52 <devananda> but we dont need to actually use python-glanceclient 19:33:56 <lifeless> python -can- be fast. 19:34:02 <devananda> aiui, glance has HTTP(S) clients as well 19:34:08 <lifeless> but significant care is needed 19:34:27 <lifeless> so, we have a network and hardware and can do some testing 19:34:33 <devananda> i saw some graphs (link escapes me now) of the overhead of python-glanceclient vs other things for pulling the same images 19:34:38 <devananda> it was definitely non-trivial 19:35:33 <lifeless> so, I think - at best- glance will be a performance wash; at worst it will be substantially slower 19:36:08 <devananda> which is more likely to serve the image from FS cache -- glance or ironic-manager? 19:36:11 <lifeless> oh, I forgot to add load-management to my notes. 19:37:05 <lifeless> I don't see FS cache as a key issue; given a dedicated glance you can load 20 1GB images (for a highly varied environment) very quickly, and a single machine can hold that in RAM 19:37:12 <lifeless> for any non-trivial server 19:37:16 <devananda> right 19:37:26 <devananda> but for the ironic-manager, that may not be the case 19:37:28 <lifeless> that said, ironic is more likely to serve it from FS cache. 19:38:01 <lifeless> because it's more likely to be dedicated to the task, vs glance that may also be used by end users 19:38:11 <devananda> if we require it to be cached by ironic-manager, there's a new requirement -- don't deploy ironic-manager in a small VM / on a 1U pizza box, because it wont have enough RAM to deploy quickly 19:38:13 <lifeless> and thus suffer cache pressure frmo other images 19:38:23 <devananda> heh 19:38:52 <devananda> besides image manipulation, ironic should be pretty low overhead. 19:39:09 <devananda> push the image downloading out to the ramdisk, and what major resource consumption is left? 19:39:20 <devananda> (nothing that i am aware of) 19:40:00 <lifeless> you can overload the network and slow everything down if you're not careful 19:40:10 <lifeless> so you'll need to manage that complexity 19:40:23 <lifeless> - force deploys into rolling mode rather than all at once 19:40:34 <devananda> that's not unique to the in-ramdisk situation 19:40:46 <devananda> the same is true if ironic is caching the images 19:41:01 <devananda> you could overwhelm th network _or_ the ironic manager IO bus 19:41:04 <devananda> or both 19:41:29 <devananda> "you can overload the network by being too scalable" isn't really an argument against doing something scalably :) 19:41:31 <lifeless> righ, but if ironic is driving, ironic has a very simple knob to control it 19:41:46 <devananda> ironic would still be driving the deploy ramdisk 19:41:49 <lifeless> actually, its a key factor in defining scalable 19:41:58 <devananda> yes. but not against doing it 19:42:06 <lifeless> uhm 19:42:09 <devananda> each manager will be tracking how many deploys are in progresss / in what state 19:42:14 <devananda> whether locally or via ramdisk 19:42:49 <lifeless> let me rephrase my prior comment: doing things with more concurrency than service points is *not* doing it scalably. 19:42:55 <devananda> i dont understand this: "if ironic is driving". why would ironic _not_ be driving the deploy ramdisk? 19:42:58 <lifeless> scalable!= infinite parallel. 19:43:48 <lifeless> scalable == responsive API + low average and 95% time-to-complete, as requests per [interval] goes up 19:44:13 <lifeless> devananda: at the ironic no agent on the ramdisk; the ramdisk polls 19:45:25 <lifeless> at the moment, ironic has no way to directly drive the agent on the ramdisk; the ramdisk polls 19:45:54 <devananda> ah 19:45:57 <devananda> so 19:46:16 <devananda> yes. perhaps that isn't clear in what i've drafted :) 19:46:28 <devananda> the ramdisk agent needs to be driven by ironic-manager 19:46:35 <devananda> for many (other) reasons (too) 19:46:48 <lifeless> From a latency perspective 19:47:04 <lifeless> I'd like to reduce the number of times the ironic-manager process has to interact with the deploy ramdisk 19:47:09 <devananda> even so, the manager will know how many deploys are in progress at any time, even if it's not driving an agent directly 19:47:44 <lifeless> so, my concerns can be summarised thusly: I'm skeptical of a python API delivering images. 19:47:58 <lifeless> in the same way I'd be skeptical of a Python API delivering block storage *content* 19:48:12 <lifeless> [vs arbitrating access to low level implementations] 19:48:13 <devananda> ack 19:48:26 <lifeless> I'm worried about the requirements for impersonation 19:49:04 <lifeless> and I'm worried about the resulting interlocking drive-step-by-step that will hurt latency on each machine if we drive each agent rather than having them be actors 19:49:30 <lifeless> its not that it's not doable, its that I think the net effect is hard to implement well, and doesn't offer benefits. 19:49:57 <devananda> i have that (security) concern today, too. and agree that we need a solution before doing the impersonate-a-glance-client bit 19:50:41 <devananda> so it sounds like we have similar concerns. i'm happy to stick this ^ list on the BP as things 19:50:45 <lifeless> I think a much simpler design is to: 19:50:48 <devananda> that must be addressed (either before or by it) 19:51:37 <lifeless> - stick with what we have until we're ready to do torrents 19:51:58 <lifeless> - then talk with the glance folk about secured-bittorrent-from-glance 19:52:23 <lifeless> - and if they want that, implement the bittorrent glue as an adjacent service in glance 19:52:33 <lifeless> - or if they don't, make it ironic specific 19:52:51 <lifeless> doing 'glanceclient' in the general sense just seems hard+uninteresting 19:53:09 <lifeless> [where failing to do the hard bit well will be bad for performance/security or both] 19:53:51 <lifeless> I'm not against responsibilities for this living in glance: I'm against using the existing service that solves a different load problem to do it. 19:53:54 <lifeless> :) 19:54:05 <lifeless> what-do-you-think-of-that 19:54:36 <devananda> i like ^. we've talked about it before :) 19:54:43 <devananda> sounds like the makings for a BP 19:55:16 <lifeless> I will be happy if we can drop the 'glance client in ramdisk' discussion, in favour of the above 19:55:25 <devananda> i do not like the current implementation -- using a shell init script + netcat and calling it an "agent" 19:55:55 <lifeless> devananda: is the iscsi/dd the thing you don't like, or the lack of testable/extensible code in the ramdisk ? 19:56:08 <lifeless> devananda: because, if it's the latter, I fully support making that better. 19:56:11 <devananda> both. much more the second 19:56:22 <devananda> i think we will want iscsi/dd functionality for some cases 19:56:26 <devananda> even far down the road 19:56:32 <devananda> so i dont want to completely abandon that 19:56:34 <lifeless> the iscsi/dd thing are pretty much ideal until we step up to a torrent-like-thing 19:56:41 <lifeless> IMNSHO 19:56:54 <devananda> and i dont want python-glanceclient in the ramdisk, but something-that-pulls-from-glance 19:56:59 <lifeless> robust standards, with auth and good performance 19:57:10 <devananda> i'm happy to clarify the wording around not-python-client 19:57:25 <devananda> but i also see an intermediate step 19:57:26 <lifeless> so, I feel like we reached consensus, and then stepped away. 19:57:29 <devananda> yea 19:57:34 <devananda> slightly away, not much 19:57:51 <lifeless> *why* do you want something-that-pulls-from-glance? Thats what I've been arguing *against* 19:58:18 <devananda> between current no secure key exchange, terrible "agent", iscs/dd 19:58:28 <devananda> and eventual secure exchange, smart agent, bittorrent 19:58:33 <devananda> i think there's room for a middle step 19:58:35 <devananda> :) 19:58:50 <lifeless> cautious-ack 19:59:17 <lifeless> I don't see why using glance's existing server-to-server/server-to-developer functionality makes any sense though 19:59:33 <devananda> hm. you may have some insight there that i lack, then 19:59:46 <lifeless> fixing the agent to be testable, event driven, more robust - great. 19:59:53 <lifeless> fixing the security aspects - great. 20:00:05 <lifeless> messing with the most efficient way to send block data around on the network - WTF. 20:00:10 <devananda> heh 20:00:56 <devananda> i'll update teh BP 20:01:19 <devananda> to clarify the parts we agree on, and add notes/whiteboard about the parts we almost agree on 20:01:25 <devananda> anyhow, out of time for today 20:01:25 <lifeless> cool 20:01:27 <devananda> thanks all! 20:01:33 <devananda> #endmeeting