15:00:03 <dtantsur> #startmeeting ironic 15:00:04 <openstack> Meeting started Mon Jan 28 15:00:03 2019 UTC and is due to finish in 60 minutes. The chair is dtantsur. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:07 <openstack> The meeting name has been set to 'ironic' 15:00:18 <dtantsur> Hi all! Who is here for the most ironic meeting in the world? :) 15:00:24 <dnuka> o/ 15:00:26 <kaifeng> o/ 15:00:27 <mgoddard> \o 15:00:29 <tendulke> o/ 15:00:30 <bdodd> o/ 15:00:30 <etingof> o/ 15:00:34 <cdearborn> o/ 15:00:36 <mjturek> \o 15:00:48 <mgoddard> \o 15:00:54 <arne_wiebalck_> o/ 15:01:05 <dtantsur> Welcome everyone! Our agenda is as usual here: 15:01:08 <dtantsur> #link https://wiki.openstack.org/wiki/Meetings/Ironic 15:01:37 <dtantsur> #topic Announcements / Reminder 15:01:50 <dtantsur> #info TheJulia is traveling for on-site meetings this week 15:01:58 <dtantsur> this was from last week's agenda, but I guess it still holds 15:02:06 <rpittau> o/ 15:02:09 <iurygregory_wfh> o/ 15:02:13 <dtantsur> #info Successful midcycle last week: https://etherpad.openstack.org/p/ironic-stein-midcycle 15:02:26 <dtantsur> please review the notes and do not forget any actions items you took :) 15:02:45 <dtantsur> anything else to announce or remind of? 15:03:26 <dtantsur> #topic Review action items from previous meeting 15:03:33 <dtantsur> #link http://eavesdrop.openstack.org/meetings/ironic/2019/ironic.2019-01-14-15.00.html 15:03:42 <dtantsur> No action items here, so moving on 15:03:51 <dtantsur> #topic Review subteam status reports (capped at ten minutes) 15:04:01 <hjensas> o/ 15:04:07 <dtantsur> #link https://etherpad.openstack.org/p/IronicWhiteBoard around line 233 15:05:55 <dtantsur> hjensas: how is neutron event processing progressing? 15:06:53 <hjensas> dtantsur: progress, but slow. I started looking at the event processor last week. I will continue this week. API patch also needs some more work. 15:07:10 <dtantsur> does it make sense to put it on the priority list this week? 15:08:12 <hjensas> mgoddard: was hesitant about merging the API version without also actually doing something with the events. I.e avoid introducing the changed behaviour without api version change later. 15:08:21 <hjensas> dtantsur: ^ 15:09:06 <dtantsur> well, "do nothing" will be supported behavior even afterwards, with the noop network interface 15:09:07 <mgoddard> there are things we could do about that, such as not bumping the API version yet, or adding a second API version bump when we support an event 15:09:28 <dtantsur> I think we don't change API versions when drivers/interfaces start/stop supporting something 15:11:05 <mgoddard> I guess I'm not a hard -1 based on that, just seems a little odd to change behaviour without an API bump 15:11:54 <mgoddard> I suppose its unavoidable sometimes 15:13:07 <rajinir> o/ 15:14:58 <dtantsur> yeah, we do it quite often 15:15:09 <dtantsur> anyway, let's bring it back to the patch 15:15:14 <dtantsur> anything on the statuses? 15:15:48 <iurygregory_wfh> zuulv3 status is in https://etherpad.openstack.org/p/zuulv3_ironicprojects_legacyjobs 15:15:55 <dtantsur> yeah, I added the link 15:15:57 <iurygregory_wfh> almost finished =) 15:16:08 <dtantsur> #link https://etherpad.openstack.org/p/zuulv3_ironicprojects_legacyjobs zuulv3 migration status 15:16:31 <dtantsur> okay, moving on? 15:17:02 <dtantsur> #topic Deciding on priorities for the coming week 15:17:13 <dtantsur> let me remove the finished things 15:17:52 <dtantsur> hjensas, mgoddard, should we add the neutron events work to the priorities? 15:18:54 <mgoddard> hjensas: it needs an update right now, right? 15:19:28 <hjensas> The API change needs an updated, to improve the data validation stuff. 15:19:40 <dtantsur> hjensas: will you have time to keep updating is this week? 15:19:52 <hjensas> I will work on it this week. 15:19:55 <dtantsur> awesome 15:20:43 <dtantsur> how's the list looking to everyone? 15:21:42 <hjensas> no objections. :) 15:22:01 <mgoddard> looks good. I'll aim to get deploy templates to a place where it could be on that list next week 15:22:30 <mgoddard> or at least some of it 15:22:43 <dtantsur> that would be really good 15:22:58 <dtantsur> okay, moving to the discussion? 15:23:39 <dtantsur> #topic Bikolla 15:23:44 <dtantsur> mgoddard: the mic is yours 15:23:55 <mgoddard> thanks 15:24:15 <mgoddard> I've been working on a little project unimaginitively called bikolla 15:24:32 <mgoddard> it uses kolla-ansible to deploy a standalone ironic cluster 15:24:50 <dtantsur> \o/ 15:24:51 <mgoddard> and parts of bifrost to build an image & deploy nodes 15:25:08 <mgoddard> it's really just a proof of concept 15:25:49 <mgoddard> the idea being that we get good support for standalone ironic in kolla-ansible, and potentially take pressure off of the ironic team with bifrost 15:26:12 <dtantsur> yeah, I think the installation bits in bifrost kind of duplicate $many_other_installers 15:26:30 <mgoddard> at the moment I have it working in a CentOS VM, using Tenks to create virtual bare metal 15:26:30 <dtantsur> and switching to kolla sounds natural to me 15:27:02 <mgoddard> so really this is an invitation to anyone who's interested in this, or Tenks, to give it a try 15:27:03 <mgoddard> https://github.com/markgoddard/bikolla 15:27:40 <dtantsur> #link https://github.com/markgoddard/bikolla prototype of kolla-ansible + bifrost 15:27:47 <mgoddard> I think that's all I have to say for now, any questions/comments? 15:27:49 <dtantsur> thanks mgoddard, this is curious 15:28:11 <kaifeng> a dumb question, does this involve to container? 15:28:20 <rpittau> mgoddard, that looks very interesting 15:28:33 <mgoddard> kaifeng: not a dumb question! It uses the kolla containers, deployed via kolla-ansible 15:29:09 <mgoddard> if you check the README, there is a dump of 'docker ps' 15:29:22 <mgoddard> 11 containers :) 15:29:31 <iurygregory_wfh> woa 15:29:32 <kaifeng> oh yeah, i feel kolla is doing containerized deployment, but never take a look on it :) 15:29:40 <dtantsur> mgoddard: I may have something to remove one of containers as the next topic ;) 15:30:03 <mgoddard> dtantsur: kill it! 15:30:06 * dtantsur also wonders what iscsid is doing there 15:30:13 <mgoddard> poor little rabbit 15:30:15 <dtantsur> hehe 15:30:20 <mgoddard> iscsid is for iscsi deploys 15:30:26 <rpittau> dtantsur, why hating rabbits so much? :D 15:30:28 <dtantsur> yeah, but why a separate container? 15:30:32 <mgoddard> why not? 15:30:34 <dtantsur> I'm pretty sure we don't have it in tripleo 15:30:42 <mgoddard> possibly not 15:30:49 <mgoddard> you could run it on the host 15:30:52 <dtantsur> well, that's an argument :) but ironic does not start the server on the conductor side, the server is on IPA 15:31:14 <mgoddard> isn't the server tgtd? 15:31:23 <mgoddard> client uses iscsid? 15:31:46 <dtantsur> mgoddard: maybe? still a bit weird to have it as a separate container. I would assume it's for Cinder. 15:32:13 <dtantsur> https://docs.openstack.org/kolla-ansible/4.0.0/cinder-guide.html#cinder-lvm2-back-end-with-iscsi 15:32:13 <mgoddard> kolla puts everything in a container 15:32:26 <mgoddard> it can also be used for cinder 15:32:32 <dtantsur> yeah, but I doubt ironic needs iscsid 15:33:01 <mgoddard> turns out I'm using direct deploy interface by default (like bifrost), so won't use it anyway 15:33:02 <dtantsur> maybe I don't know something about it 15:33:08 <dtantsur> heh 15:33:39 <mgoddard> anyways, thanks for listening, happy to help anyone wanting to use it 15:34:04 <dtantsur> mgoddard++ 15:34:13 <dtantsur> #topic RFE review 15:34:47 <dtantsur> #link https://storyboard.openstack.org/#!/story/2004874 Support JSON-RPC as an alternative for oslo.messaging 15:35:03 <dtantsur> #link https://review.openstack.org/633052 PoC patch 15:35:04 <patchbot> patch 633052 - ironic - [PROTOTYPE] Use JSON-RPC instead of oslo.messaging - 8 patch sets 15:35:23 <dtantsur> it actually passed all devstack jobs at one point (I changed it to remove the Flask dependency after that) 15:35:43 <dtantsur> I think it's pretty cool for standalone usage like in bikolla/bifrost 15:36:05 <dtantsur> I don't suggest we approve the RFE right now, but your comments are welcome :) 15:36:42 <mgoddard> do you think it's suitable for non-standalone? 15:37:04 <dtantsur> mgoddard: I don't see why not 15:37:17 <dtantsur> but non-standalone case will have rabbitmq anyway (for nova and other services) 15:37:30 <mgoddard> unless we can persuade them :) 15:37:53 <dtantsur> I was told some of the projects actually use messaging queue features of oslo.msg 15:37:56 <mgoddard> avoiding a middle-man seems like a good thing 15:38:09 <mgoddard> any downsides? 15:38:21 <mgoddard> resilience to conductor restarts? 15:38:39 <dtantsur> yeah, a request will get aborted if a conductor fails mid-way 15:39:01 <dtantsur> but since oslo.msg only implements "at most one" semantics, I think it can happen with it as well 15:39:02 <mgoddard> lots of connections required if I run a million conductors? 15:39:04 <kaifeng> hmm, actually that applies to rabbitmq too 15:39:27 <dtantsur> mgoddard: if you have a million conductors, each of them will talk to rabbit 15:39:44 <kaifeng> rabbitmq got retry ability, how about json-rpc? 15:40:00 <dtantsur> kaifeng: it's just HTTP, you can use retries, https, etc 15:40:25 <dtantsur> I don't even use a special client in my PoC patch, just plain 'requests' lib 15:40:38 <mgoddard> dtantsur: true, although it puts the high fanout in one place (for better or worse) 15:40:51 <mgoddard> seems like an interesting PoC 15:41:24 <dtantsur> I guess I'll have to provide some kind of authentication for it before we can really land it 15:41:30 <dtantsur> and HTTPs support 15:41:32 <mgoddard> +1 15:41:59 <dtantsur> but early reviews and suggestions are welcome 15:42:19 <mgoddard> on the large conductor count question, it might affect connection reuse 15:42:33 <mgoddard> would need to be tested 15:42:42 <dtantsur> how many conductors do people practically have? 15:43:00 <mgoddard> ask oath :) 15:43:05 <dtantsur> I don't think a million is anywhere near a realistic estimate :) 15:43:11 <arne_wiebalck_> with 1700 nodes we have 3 conductors 15:43:17 <dtantsur> right 15:43:29 <dtantsur> I'd bet a few dozens is enough for every practical case 15:43:32 <mgoddard> yeah, not really expecting a millon 15:43:39 <mgoddard> should expect so 15:44:37 <dtantsur> #topic Open discussion 15:44:45 <dtantsur> the floor is open 15:44:52 <arne_wiebalck_> I have a small issue 15:45:03 <arne_wiebalck_> I’d like some input on https://review.openstack.org/#/c/632774 15:45:05 <patchbot> patch 632774 - ironic - Preserve BIOS boot order upon deployment - 4 patch sets 15:45:27 <arne_wiebalck_> this is a patch to always preserve the bios boot order 15:45:40 <arne_wiebalck_> to make it configurable to be precise 15:46:13 <arne_wiebalck_> while our use case is for IPMI, there were comments whether this should be applied to other h/w types as well 15:46:29 <dtantsur> arne_wiebalck_: I'd call the new option "allow_persistent_boot_device" or something like that. and maybe have it in driver_info per node in addition to the config. 15:46:49 <arne_wiebalck_> dtantsur: I think I did now 15:47:20 <dtantsur> it looks like you only use the config option: https://review.openstack.org/#/c/632774/4/ironic/drivers/modules/pxe.py 15:47:21 <patchbot> patch 632774 - ironic - Preserve BIOS boot order upon deployment - 4 patch sets 15:47:43 <etingof> afaik, it's not just persistent, it's the device as the admin manually set on the node 15:48:04 <arne_wiebalck_> dtantsur: yes, sorry, I mis-read 15:48:16 <dtantsur> hmm, yeah, I guess the current name makes sense as well 15:48:27 <dtantsur> also I don't think it belongs in [agent] section, since it's not IPA-specific 15:48:47 <arne_wiebalck_> dtantsur: right kaifeng pointed this out as well 15:48:50 <dtantsur> and I wonder if we should handle it on some top level, so that we don't have to put it in every boot interface 15:49:33 <arne_wiebalck_> dtantsur: so you think it’s should be available across all hardware types? 15:49:56 <dtantsur> yeah, I think this behavior should not change if you switch the driver 15:49:57 <arne_wiebalck_> it shouldn’t harm, just wasn’t sure if that will be useful to anyone but us 15:50:09 <arne_wiebalck_> dtantsur: that’s alos a point, yes 15:50:13 <dtantsur> I think I see similar requests from customers from time to time 15:50:15 <arne_wiebalck_> s/alos/also/ 15:50:38 <arne_wiebalck_> ok, that would mean updating all h/w types 15:51:41 <dtantsur> this is why I wonder if we can avoid doing it by putting this logic somewhere 15:52:01 <arne_wiebalck_> ah, right 15:52:57 <arne_wiebalck_> I can have a look if that is possible 15:53:13 <arne_wiebalck_> otherwise, the change (as done for ipmi) is pretty simple 15:53:22 <arne_wiebalck_> and easy to understand 15:53:45 <dtantsur> yeah 15:53:50 <arne_wiebalck_> cool, thx! 15:53:50 <dtantsur> thanks arne_wiebalck_ 15:53:59 <dtantsur> anyone has anything else? 15:54:43 <kaifeng> I wonder if anyone awares something about inband instance monitoring? 15:55:09 <dtantsur> we generally try to avoid touching anything in running instances 15:56:12 <arne_wiebalck_> this is/was also discussed in the context of a cmdb-like functionality 15:56:25 <arne_wiebalck_> if it is not possible to get data OOB 15:56:41 <kaifeng> well, it's originated from need of customers, just want know if there is any mature design 15:57:48 <kaifeng> there is need to collecting stats from bm instances, but it appears to me that the only way is to have a public ip and establish a monitoring server there. 15:57:54 <mgoddard> kaifeng: we typically use monasca 15:58:09 <mgoddard> allows for collecting control plane and user logs and metrics 15:58:28 <mgoddard> users need to run agents on their instances 15:58:45 <mgoddard> the nice thing is it's multi-tenant aware 15:58:59 <mgoddard> it's quite complex though 15:59:11 <arne_wiebalck_> this is along the lines of the cmdb discussion, there was sth from rackspace at some point I think 15:59:14 <kaifeng> it works for tenant network too? 15:59:26 <mgoddard> kaifeng: http://www.stackhpc.com/monasca-comes-to-kolla.html 16:00:16 <kaifeng> thanks mgoddard, logged will take a look 16:00:25 <mgoddard> kaifeng: you need to make the monasca API available to tenants 16:01:09 <mgoddard> kaifeng: ironic can collect stats via IPMI and send them as notifications via rabbitmq 16:01:15 <kaifeng> oh, I have no idea of monasca 16:01:32 <mgoddard> kaifeng: (that part is separate from monasca) 16:01:32 <kaifeng> so it's oob 16:01:51 <iurygregory_wfh> my experience with monasca i only say one word pain XD 16:01:57 <mgoddard> monasca is usually in-band, via an agent. the ironic monitoring is OOB 16:02:10 <iurygregory_wfh> not sure if is better now 16:02:15 <mgoddard> iurygregory_wfh: yeah, it can be difficult 16:02:35 <mgoddard> we put a lot of work into deploying it via kolla-ansible, so hopefully a bit easier to deploy now 16:02:38 <iurygregory_wfh> main problem was memory XD 16:02:39 <kaifeng> thanks anyway, I think I need to take a look at the monasca first :) 16:02:44 <dtantsur> okay, let's wrap it up 16:02:48 <dtantsur> #endmeeting