#openstack-ironic log

15:00:03 <dtantsur> #startmeeting ironic
15:00:04 <openstack> Meeting started Mon Jan 28 15:00:03 2019 UTC and is due to finish in 60 minutes.  The chair is dtantsur. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:07 <openstack> The meeting name has been set to 'ironic'
15:00:18 <dtantsur> Hi all! Who is here for the most ironic meeting in the world? :)
15:00:24 <dnuka> o/
15:00:26 <kaifeng> o/
15:00:27 <mgoddard> \o
15:00:29 <tendulke> o/
15:00:30 <bdodd> o/
15:00:30 <etingof> o/
15:00:34 <cdearborn> o/
15:00:36 <mjturek> \o
15:00:48 <mgoddard> \o
15:00:54 <arne_wiebalck_> o/
15:01:05 <dtantsur> Welcome everyone! Our agenda is as usual here:
15:01:08 <dtantsur> #link https://wiki.openstack.org/wiki/Meetings/Ironic
15:01:37 <dtantsur> #topic Announcements / Reminder
15:01:50 <dtantsur> #info TheJulia is traveling for on-site meetings this week
15:01:58 <dtantsur> this was from last week's agenda, but I guess it still holds
15:02:06 <rpittau> o/
15:02:09 <iurygregory_wfh> o/
15:02:13 <dtantsur> #info Successful midcycle last week: https://etherpad.openstack.org/p/ironic-stein-midcycle
15:02:26 <dtantsur> please review the notes and do not forget any actions items you took :)
15:02:45 <dtantsur> anything else to announce or remind of?
15:03:26 <dtantsur> #topic Review action items from previous meeting
15:03:33 <dtantsur> #link http://eavesdrop.openstack.org/meetings/ironic/2019/ironic.2019-01-14-15.00.html
15:03:42 <dtantsur> No action items here, so moving on
15:03:51 <dtantsur> #topic Review subteam status reports (capped at ten minutes)
15:04:01 <hjensas> o/
15:04:07 <dtantsur> #link https://etherpad.openstack.org/p/IronicWhiteBoard around line 233
15:05:55 <dtantsur> hjensas: how is neutron event processing progressing?
15:06:53 <hjensas> dtantsur: progress, but slow. I started looking at the event processor last week. I will continue this week. API patch also needs some more work.
15:07:10 <dtantsur> does it make sense to put it on the priority list this week?
15:08:12 <hjensas> mgoddard: was hesitant about merging the API version without also actually doing something with the events. I.e avoid introducing the changed behaviour without api version change later.
15:08:21 <hjensas> dtantsur: ^
15:09:06 <dtantsur> well, "do nothing" will be supported behavior even afterwards, with the noop network interface
15:09:07 <mgoddard> there are things we could do about that, such as not bumping the API version yet, or adding a second API version bump when we support an event
15:09:28 <dtantsur> I think we don't change API versions when drivers/interfaces start/stop supporting something
15:11:05 <mgoddard> I guess I'm not a hard -1 based on that, just seems a little odd to change behaviour without an API bump
15:11:54 <mgoddard> I suppose its unavoidable sometimes
15:13:07 <rajinir> o/
15:14:58 <dtantsur> yeah, we do it quite often
15:15:09 <dtantsur> anyway, let's bring it back to the patch
15:15:14 <dtantsur> anything on the statuses?
15:15:48 <iurygregory_wfh> zuulv3 status is in https://etherpad.openstack.org/p/zuulv3_ironicprojects_legacyjobs
15:15:55 <dtantsur> yeah, I added the link
15:15:57 <iurygregory_wfh> almost finished =)
15:16:08 <dtantsur> #link https://etherpad.openstack.org/p/zuulv3_ironicprojects_legacyjobs zuulv3 migration status
15:16:31 <dtantsur> okay, moving on?
15:17:02 <dtantsur> #topic Deciding on priorities for the coming week
15:17:13 <dtantsur> let me remove the finished things
15:17:52 <dtantsur> hjensas, mgoddard, should we add the neutron events work to the priorities?
15:18:54 <mgoddard> hjensas: it needs an update right now, right?
15:19:28 <hjensas> The API change needs an updated, to improve the data validation stuff.
15:19:40 <dtantsur> hjensas: will you have time to keep updating is this week?
15:19:52 <hjensas> I will work on it this week.
15:19:55 <dtantsur> awesome
15:20:43 <dtantsur> how's the list looking to everyone?
15:21:42 <hjensas> no objections. :)
15:22:01 <mgoddard> looks good. I'll aim to get deploy templates to a place where it could be on that list next week
15:22:30 <mgoddard> or at least some of it
15:22:43 <dtantsur> that would be really good
15:22:58 <dtantsur> okay, moving to the discussion?
15:23:39 <dtantsur> #topic Bikolla
15:23:44 <dtantsur> mgoddard: the mic is yours
15:23:55 <mgoddard> thanks
15:24:15 <mgoddard> I've been working on a little project unimaginitively called bikolla
15:24:32 <mgoddard> it uses kolla-ansible to deploy a standalone ironic cluster
15:24:50 <dtantsur> \o/
15:24:51 <mgoddard> and parts of bifrost to build an image & deploy nodes
15:25:08 <mgoddard> it's really just a proof of concept
15:25:49 <mgoddard> the idea being that we get good support for standalone ironic in kolla-ansible, and potentially take pressure off of the ironic team with bifrost
15:26:12 <dtantsur> yeah, I think the installation bits in bifrost kind of duplicate $many_other_installers
15:26:30 <mgoddard> at the moment I have it working in a CentOS VM, using Tenks to create virtual bare metal
15:26:30 <dtantsur> and switching to kolla sounds natural to me
15:27:02 <mgoddard> so really this is an invitation to anyone who's interested in this, or Tenks, to give it a try
15:27:03 <mgoddard> https://github.com/markgoddard/bikolla
15:27:40 <dtantsur> #link https://github.com/markgoddard/bikolla prototype of kolla-ansible + bifrost
15:27:47 <mgoddard> I think that's all I have to say for now, any questions/comments?
15:27:49 <dtantsur> thanks mgoddard, this is curious
15:28:11 <kaifeng> a dumb question, does this involve to container?
15:28:20 <rpittau> mgoddard, that looks very interesting
15:28:33 <mgoddard> kaifeng: not a dumb question! It uses the kolla containers, deployed via kolla-ansible
15:29:09 <mgoddard> if you check the README, there is a dump of 'docker ps'
15:29:22 <mgoddard> 11 containers :)
15:29:31 <iurygregory_wfh> woa
15:29:32 <kaifeng> oh yeah, i feel kolla is doing containerized deployment, but never take a look on it :)
15:29:40 <dtantsur> mgoddard: I may have something to remove one of containers as the next topic ;)
15:30:03 <mgoddard> dtantsur: kill it!
15:30:06 * dtantsur also wonders what iscsid is doing there
15:30:13 <mgoddard> poor little rabbit
15:30:15 <dtantsur> hehe
15:30:20 <mgoddard> iscsid is for iscsi deploys
15:30:26 <rpittau> dtantsur, why hating rabbits so much? :D
15:30:28 <dtantsur> yeah, but why a separate container?
15:30:32 <mgoddard> why not?
15:30:34 <dtantsur> I'm pretty sure we don't have it in tripleo
15:30:42 <mgoddard> possibly not
15:30:49 <mgoddard> you could run it on the host
15:30:52 <dtantsur> well, that's an argument :) but ironic does not start the server on the conductor side, the server is on IPA
15:31:14 <mgoddard> isn't the server tgtd?
15:31:23 <mgoddard> client uses iscsid?
15:31:46 <dtantsur> mgoddard: maybe? still a bit weird to have it as a separate container. I would assume it's for Cinder.
15:32:13 <dtantsur> https://docs.openstack.org/kolla-ansible/4.0.0/cinder-guide.html#cinder-lvm2-back-end-with-iscsi
15:32:13 <mgoddard> kolla puts everything in a container
15:32:26 <mgoddard> it can also be used for cinder
15:32:32 <dtantsur> yeah, but I doubt ironic needs iscsid
15:33:01 <mgoddard> turns out I'm using direct deploy interface by default (like bifrost), so won't use it anyway
15:33:02 <dtantsur> maybe I don't know something about it
15:33:08 <dtantsur> heh
15:33:39 <mgoddard> anyways, thanks for listening, happy to help anyone wanting to use it
15:34:04 <dtantsur> mgoddard++
15:34:13 <dtantsur> #topic RFE review
15:34:47 <dtantsur> #link https://storyboard.openstack.org/#!/story/2004874 Support JSON-RPC as an alternative for oslo.messaging
15:35:03 <dtantsur> #link https://review.openstack.org/633052 PoC patch
15:35:04 <patchbot> patch 633052 - ironic - [PROTOTYPE] Use JSON-RPC instead of oslo.messaging - 8 patch sets
15:35:23 <dtantsur> it actually passed all devstack jobs at one point (I changed it to remove the Flask dependency after that)
15:35:43 <dtantsur> I think it's pretty cool for standalone usage like in bikolla/bifrost
15:36:05 <dtantsur> I don't suggest we approve the RFE right now, but your comments are welcome :)
15:36:42 <mgoddard> do you think it's suitable for non-standalone?
15:37:04 <dtantsur> mgoddard: I don't see why not
15:37:17 <dtantsur> but non-standalone case will have rabbitmq anyway (for nova and other services)
15:37:30 <mgoddard> unless we can persuade them :)
15:37:53 <dtantsur> I was told some of the projects actually use messaging queue features of oslo.msg
15:37:56 <mgoddard> avoiding a middle-man seems like a good thing
15:38:09 <mgoddard> any downsides?
15:38:21 <mgoddard> resilience to conductor restarts?
15:38:39 <dtantsur> yeah, a request will get aborted if a conductor fails mid-way
15:39:01 <dtantsur> but since oslo.msg only implements "at most one" semantics, I think it can happen with it as well
15:39:02 <mgoddard> lots of connections required if I run a million conductors?
15:39:04 <kaifeng> hmm, actually that applies to rabbitmq too
15:39:27 <dtantsur> mgoddard: if you have a million conductors, each of them will talk to rabbit
15:39:44 <kaifeng> rabbitmq got retry ability, how about json-rpc?
15:40:00 <dtantsur> kaifeng: it's just HTTP, you can use retries, https, etc
15:40:25 <dtantsur> I don't even use a special client in my PoC patch, just plain 'requests' lib
15:40:38 <mgoddard> dtantsur: true, although it puts the high fanout in one place (for better or worse)
15:40:51 <mgoddard> seems like an interesting PoC
15:41:24 <dtantsur> I guess I'll have to provide some kind of authentication for it before we can really land it
15:41:30 <dtantsur> and HTTPs support
15:41:32 <mgoddard> +1
15:41:59 <dtantsur> but early reviews and suggestions are welcome
15:42:19 <mgoddard> on the large conductor count question, it might affect connection reuse
15:42:33 <mgoddard> would need to be tested
15:42:42 <dtantsur> how many conductors do people practically have?
15:43:00 <mgoddard> ask oath :)
15:43:05 <dtantsur> I don't think a million is anywhere near a realistic estimate :)
15:43:11 <arne_wiebalck_> with 1700 nodes we have 3 conductors
15:43:17 <dtantsur> right
15:43:29 <dtantsur> I'd bet a few dozens is enough for every practical case
15:43:32 <mgoddard> yeah, not really expecting a millon
15:43:39 <mgoddard> should expect so
15:44:37 <dtantsur> #topic Open discussion
15:44:45 <dtantsur> the floor is open
15:44:52 <arne_wiebalck_> I have a small issue
15:45:03 <arne_wiebalck_> I’d like some input on https://review.openstack.org/#/c/632774
15:45:05 <patchbot> patch 632774 - ironic - Preserve BIOS boot order upon deployment - 4 patch sets
15:45:27 <arne_wiebalck_> this is a patch to always preserve the bios boot order
15:45:40 <arne_wiebalck_> to make it configurable to be precise
15:46:13 <arne_wiebalck_> while our use case is for IPMI, there were comments whether this should be applied to other h/w types as well
15:46:29 <dtantsur> arne_wiebalck_: I'd call the new option "allow_persistent_boot_device" or something like that. and maybe have it in driver_info per node in addition to the config.
15:46:49 <arne_wiebalck_> dtantsur: I think I did now
15:47:20 <dtantsur> it looks like you only use the config option: https://review.openstack.org/#/c/632774/4/ironic/drivers/modules/pxe.py
15:47:21 <patchbot> patch 632774 - ironic - Preserve BIOS boot order upon deployment - 4 patch sets
15:47:43 <etingof> afaik, it's not just persistent, it's the device as the admin manually set on the node
15:48:04 <arne_wiebalck_> dtantsur: yes, sorry, I mis-read
15:48:16 <dtantsur> hmm, yeah, I guess the current name makes sense as well
15:48:27 <dtantsur> also I don't think it belongs in [agent] section, since it's not IPA-specific
15:48:47 <arne_wiebalck_> dtantsur: right kaifeng pointed this out as well
15:48:50 <dtantsur> and I wonder if we should handle it on some top level, so that we don't have to put it in every boot interface
15:49:33 <arne_wiebalck_> dtantsur: so you think it’s should be available across all hardware types?
15:49:56 <dtantsur> yeah, I think this behavior should not change if you switch the driver
15:49:57 <arne_wiebalck_> it shouldn’t harm, just wasn’t sure if that will be useful to anyone but us
15:50:09 <arne_wiebalck_> dtantsur: that’s alos a point, yes
15:50:13 <dtantsur> I think I see similar requests from customers from time to time
15:50:15 <arne_wiebalck_> s/alos/also/
15:50:38 <arne_wiebalck_> ok, that would mean updating all h/w types
15:51:41 <dtantsur> this is why I wonder if we can avoid doing it by putting this logic somewhere
15:52:01 <arne_wiebalck_> ah, right
15:52:57 <arne_wiebalck_> I can have a look if that is possible
15:53:13 <arne_wiebalck_> otherwise, the change (as done for ipmi) is pretty simple
15:53:22 <arne_wiebalck_> and easy to understand
15:53:45 <dtantsur> yeah
15:53:50 <arne_wiebalck_> cool, thx!
15:53:50 <dtantsur> thanks arne_wiebalck_
15:53:59 <dtantsur> anyone has anything else?
15:54:43 <kaifeng> I wonder if anyone awares something about inband instance monitoring?
15:55:09 <dtantsur> we generally try to avoid touching anything in running instances
15:56:12 <arne_wiebalck_> this is/was also discussed in the context of a cmdb-like functionality
15:56:25 <arne_wiebalck_> if it is not possible to get data OOB
15:56:41 <kaifeng> well, it's originated from need of customers, just want know if there is any mature design
15:57:48 <kaifeng> there is need to collecting stats from bm instances, but it appears to me that the only way is to have a public ip and establish a monitoring server there.
15:57:54 <mgoddard> kaifeng: we typically use monasca
15:58:09 <mgoddard> allows for collecting control plane and user logs and metrics
15:58:28 <mgoddard> users need to run agents on their instances
15:58:45 <mgoddard> the nice thing is it's multi-tenant aware
15:58:59 <mgoddard> it's quite complex though
15:59:11 <arne_wiebalck_> this is along the lines of the cmdb discussion, there was sth from rackspace at some point I think
15:59:14 <kaifeng> it works for tenant network too?
15:59:26 <mgoddard> kaifeng: http://www.stackhpc.com/monasca-comes-to-kolla.html
16:00:16 <kaifeng> thanks mgoddard, logged will take a look
16:00:25 <mgoddard> kaifeng: you need to make the monasca API available to tenants
16:01:09 <mgoddard> kaifeng: ironic can collect stats via IPMI and send them as notifications via rabbitmq
16:01:15 <kaifeng> oh, I have no idea of monasca
16:01:32 <mgoddard> kaifeng: (that part is separate from monasca)
16:01:32 <kaifeng> so it's oob
16:01:51 <iurygregory_wfh> my experience with monasca i only say one word pain XD
16:01:57 <mgoddard> monasca is usually in-band, via an agent. the ironic monitoring is OOB
16:02:10 <iurygregory_wfh> not sure if is better now
16:02:15 <mgoddard> iurygregory_wfh: yeah, it can be difficult
16:02:35 <mgoddard> we put a lot of work into deploying it via kolla-ansible, so hopefully a bit easier to deploy now
16:02:38 <iurygregory_wfh> main problem was memory XD
16:02:39 <kaifeng> thanks anyway, I think I need to take a look at the monasca first :)
16:02:44 <dtantsur> okay, let's wrap it up
16:02:48 <dtantsur> #endmeeting