15:10:29 <dtantsur> #startmeeting ironic
15:10:30 <openstack> Meeting started Mon May 13 15:10:29 2019 UTC and is due to finish in 60 minutes.  The chair is dtantsur. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:10:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:10:34 <openstack> The meeting name has been set to 'ironic'
15:10:46 <etingof> \o/
15:10:46 <dtantsur> hi all, sorry for the late start (first day after 2 weeks of absence, sigh)
15:10:50 <bdodd> o/
15:10:52 <kaifeng> o/
15:10:53 <rpittau> meeting! o/
15:10:55 <hjensas> o/
15:10:55 <arne_wiebalck> o/
15:11:00 <jroll> morning :)
15:11:09 <dtantsur> our agenda is pretty light:
15:11:12 <dtantsur> #link https://wiki.openstack.org/wiki/Meetings/Ironic
15:11:12 <rloo> o/
15:11:28 <dtantsur> #topic Announcements / Reminder
15:11:46 <dtantsur> #info The Summit is over, thanks all :)
15:11:50 <dtantsur> #link https://dtantsur.github.io/posts/ironic-denver-2019/ dtantsur's notes from the Summit + PTG in Denver
15:12:10 <dtantsur> Subjectively, a lot is going on around ironic
15:12:14 <arne_wiebalck> thanks for the notes dtantsur, that's useful
15:12:17 <rloo> thx dtantsur, for your notes on the summit/ptg!
15:12:32 <rpittau> thank you for the notes dtantsur  :)
15:12:42 <dtantsur> #info The next Summit + PTG will be in Shanghai, November 4 - 6, and the PTG will be November 6 - 8, 2019
15:13:13 <dtantsur> #link https://www.openstack.org/summit/shanghai-2019
15:13:55 <dtantsur> if you've dreamt of going to China, this is your chance :) registration is open already
15:14:19 <rpittau> well I guess I can't miss 3 summits in a row :P
15:14:25 <dtantsur> heh
15:14:35 <dtantsur> I have mixed feelings about the next one :)
15:14:37 <dtantsur> anyway
15:14:40 <dtantsur> anything else to announce?
15:14:54 <dtantsur> hmm, maybe this one:
15:15:03 <dtantsur> #link https://www.openstack.org/bare-metal/ The official bare metal program
15:15:17 <cdearborn> o/ sorry i'm late!
15:15:21 <dtantsur> also you'll notice that we have Baremetal SIG business as part of our agenda
15:15:35 <dtantsur> anything else?
15:15:38 <mgoddard> o/
15:15:59 <mjturek> o/
15:15:59 <mkrai> o/
15:16:20 <rajinir> o/
15:16:41 <dtantsur> No action items from the last week, and I doubt we have anything new on the whiteboard (or do we?)
15:17:00 <dtantsur> I guess TheJulia will propose the priorities for Train officially next week
15:17:23 <baha> o/
15:17:42 <dtantsur> those who like spoilers, check https://etherpad.openstack.org/p/DEN-train-ironic-ptg after line 639
15:17:45 <dtantsur> * 539
15:18:22 <dtantsur> #topic Deciding on priorities for the coming week
15:20:17 <dtantsur> what do you think about the list as it is now?
15:21:07 <TheJulia> I didn't have time to update it, so I think it is okay to just carry it forward if it is up to date.
15:21:23 <TheJulia> Re priorities, next Monday most likely :(
15:21:24 <dtantsur> reasonable up-to-date, I just added the mdns thingy
15:21:34 <rpittau> I did update some stuff based on merged things last week
15:21:46 <etingof> let's review this as well please? -- https://review.opendev.org/655685
15:21:46 <patchbot> patch 655685 - ironic-specs - Add spec for indicator management - 6 patch sets
15:21:47 <TheJulia> dtantsur: perfect :)
15:22:14 <dtantsur> etingof: fair enough, added
15:22:17 <dtantsur> any other comments?
15:22:29 <rloo> I haven't read the retirement spec (https://review.opendev.org/#/c/656799/) but we did discuss that at ptg, wrt beth's ask for other states? so wondering if more work is required in that spec.
15:22:30 <patchbot> patch 656799 - ironic-specs - Add support for node retirement - 4 patch sets
15:22:51 <arne_wiebalck> rloo: it's on the agenda :)
15:23:13 <dtantsur> rloo: yeah, we have this on the agenda, and I'd like it on priorities to highlight this discussion
15:23:15 <rloo> arne_wiebalck: oh, is that what 'needs follow-up from the PTG' is about?
15:23:20 <dtantsur> rloo: yes
15:23:23 * etingof has 10-patch long chain against sushy-tools to review...
15:23:29 * dtantsur needs time and spoons to type letters about that spec
15:23:36 <rloo> arne_wiebalck, dtantsur: ok, thx for clarifying
15:23:39 <dtantsur> arne_wiebalck: you can find some of the information in my summary
15:23:52 <arne_wiebalck> dtantsur: ok
15:23:59 <dtantsur> etingof: I'm still -1 on half of them, but ignoring in case somebody cares less..
15:24:27 * rloo thinks there are a lot of priorities for this week but i guess we had them all in previous weeks.
15:24:28 <etingof> dtantsur, I've refactored them heavily since your last review
15:24:38 <dtantsur> okay, I'll check again if my concerns still hold
15:24:53 <etingof> though the things you did not like still there ;)
15:24:56 <dtantsur> rloo: exactly
15:25:06 <dtantsur> etingof: well, here we go :) do you really want me to review them? ;)
15:25:42 <etingof> dtantsur, absolutely!
15:26:02 <rloo> is cisco-ucs out?
15:26:10 <dtantsur> TheJulia: ^^?
15:26:39 <etingof> dtantsur, I hope that with my recent additions (chassis, indicators and vmedia boot emulation) the design I am proposing makes more sense...
15:26:45 <TheJulia> Yeah, pretty much. :(
15:27:06 <rloo> ok, i'm deleting that from vendor priorities then.
15:27:14 <dtantsur> etingof: if you found a real manager object in libvirt - yes, otherwise I expect a similar comment
15:27:33 <dtantsur> anyway, does the list look good?
15:28:19 <TheJulia> I did chat with ?ianw? In Denver and he stressed that we should do what is best for the project.
15:28:30 <dtantsur> sigh
15:28:39 <etingof> dtantsur, it's the other way round - if you find a real manager and chassis objects in libvirt, then your design makes sense ;)
15:28:50 <TheJulia> Yeah. :(
15:29:08 <TheJulia> Anyway, airplane mode time.
15:29:21 <dtantsur> #topic Baremetal SIG
15:29:26 <dtantsur> Anything for today?
15:29:54 <TheJulia> The white paper could use some assistance with regards to use cases
15:30:01 <rloo> dtantsur: as an intro -- did anyone mention why we have baremetal sig in this meeting?
15:30:24 <dtantsur> rloo: I did not, because I was not present when it was decided
15:30:55 <rpittau> when was that decided ?
15:30:58 <TheJulia> Tl;dr there needs to be a periodic reminder, so the meeting was the best time
15:31:17 <rloo> dtantsur: i can't recall now, if there was email/announcement about this. if there wasn't, there should be, or mention it here?
15:31:43 <TheJulia> During one of the baremetal sessions there was consensus that it would be good to at least raise as part of this meeting given overlaps.
15:32:21 <TheJulia> Chris hodge... appears to be offline at the moment, but this is largely for human wrangling, so if there are no items then we can skip past.
15:33:04 <rloo> fair enough, and that is fine with me. could we have an action item for someone (chris?) to send out email to both groups, about this being in the ironic meeting?
15:33:26 <rloo> (unless folks disagree with it)
15:33:35 <TheJulia> I concur, I won’t be able to do it this week.
15:33:40 <rloo> (since not everyone was at ptg/summit)
15:33:51 <TheJulia> Anyway, they just closed the boarding door, need to quite literally disconnect now :(
15:34:06 <arne_wiebalck> I can ping Chris.
15:34:11 <TheJulia> Have a wonderful day everyone
15:34:23 <rloo> BYE TheJulia!
15:34:23 <dtantsur> safe flight TheJulia
15:34:32 <rpittau> TheJulia: safe flight :)
15:34:55 <dtantsur> arne_wiebalck: please! I think we can do it off-meeting.
15:34:58 <rloo> ok, AI for arne_wiebalck to ping chris to ask chris to send email out etc.
15:36:14 <dtantsur> ++
15:36:20 <dtantsur> #topic RFE review
15:36:54 <dtantsur> #link https://review.opendev.org/#/c/656799/ Support for node retirement / nodes in failure states
15:36:55 <patchbot> patch 656799 - ironic-specs - Add support for node retirement - 4 patch sets
15:37:09 <dtantsur> as I said, I did not have time to write more detailed thoughts - sorry for that
15:37:24 <dtantsur> but something that did come up in the room is that people want a new state more than a new flag
15:37:34 <arne_wiebalck> this spec is basically a summary of a discussion I had with jroll and TheJulia
15:37:39 <dtantsur> I was initially quite against it, but then got more or less convinced
15:38:12 <dtantsur> although this may be a difference between retiring nodes and "failing" them..
15:38:34 <arne_wiebalck> how do you  mark an active node for retirement when retired is a state?
15:38:49 <dtantsur> with a state transition that does not, however, tear it down
15:39:12 <arne_wiebalck> so instances can be on nodes in state retired?
15:39:16 <dtantsur> that's what people wanted for nodes at fault: keep it intact, but mark it as very broken
15:39:18 <dtantsur> yep
15:39:22 <arne_wiebalck> ok
15:39:27 <jroll> hmm
15:39:28 <dtantsur> which is going to confuse the heck out of nova, I suspect
15:39:55 <rpittau> interesting, it goes slightly against logic imho
15:39:58 <jroll> I'd love to see the proposed state machine
15:40:08 <jroll> er, state diagram
15:40:11 <arne_wiebalck> the idea behind the flag was to be very close to 'maintenace', an attribute you assign to a node
15:40:25 <dtantsur> the thing about maintenance is that it can happen in literally any state
15:40:28 <arne_wiebalck> and such a flag would not interfere with the state machine
15:40:38 <arne_wiebalck> retired as well
15:40:39 <dtantsur> does it make sense to retire a node in "enroll"? "cleaning"? "manageble"?
15:40:40 <rpittau> well in theory retirement too
15:40:59 <arne_wiebalck> dtantsur: I think so, why not?
15:41:00 <rloo> i think more thinking is needed here. wondering if we want a 'retire' state, AND some flag eg 'next-step' or 'next-phase'
15:41:01 <rpittau> I think it does
15:41:46 <dtantsur> arne_wiebalck: if you mark an available node as retired, how do you prevent nova from scheduling to it?
15:42:14 <arne_wiebalck> how about using the same mechanism 'maintenance' does?
15:42:26 <arne_wiebalck> the same way
15:42:33 <dtantsur> maintenance has been a part of our API since the inception
15:42:42 <dtantsur> so it's something everybody is used to checking
15:42:50 <dtantsur> older tooling would not take retired into account
15:43:12 <dtantsur> I think this ^^ should be in the spec even if we keep the flag approach
15:43:14 <arne_wiebalck> older tooling like  ...?
15:43:37 <dtantsur> nova, metalsmith, metal3 are the tools I'm aware of
15:43:39 <arne_wiebalck> we could also set maintenance in addition to retired
15:43:58 <dtantsur> maintenance has side effects like preventing cleaning from working
15:44:12 <rloo> at what point is the node 'retired' ?
15:44:12 <arne_wiebalck> true, I take that back :)
15:44:32 <rloo> we want the node to-be-retired, but when is it 'retired'?
15:44:39 <arne_wiebalck> rloo, it is not retired, it's more marked for retirement
15:44:47 <rloo> i'm having trouble understanding the state of the node in this.
15:45:13 <rloo> arne_wiebalck: from the spec, it seems like what you want is to indicate 'do not make this available'
15:45:21 <arne_wiebalck> rloo: right
15:45:53 <arne_wiebalck> I'd like to mark an 'active' node as on its way out
15:45:55 <rloo> arne_wiebalck: and (based on problem description) you want to be able to search/list nodes
15:46:03 <arne_wiebalck> rloo: correct
15:46:28 <dtantsur> arne_wiebalck: but not only active nodes, all states?
15:46:29 * arne_wiebalck thinks rloo is preparing a suggestion
15:46:37 <arne_wiebalck> dtantsur: yes
15:46:40 <rloo> so end user/nova invokes 'delete' on their instance.
15:46:50 <rloo> i'm not preparing a solution, just thinking about the problem. sorry.
15:47:00 <arne_wiebalck> dtantsur: also nodes in clean_failed may be retired
15:47:07 <arne_wiebalck> rloo: :)
15:47:47 <rloo> seems like one could want to 'retire' a node in any state. in some states, the operator could just do an 'openstack node delete', right, no need to sayd 'openstack node retire ...'.
15:48:15 <arne_wiebalck> rloo: correct
15:48:18 <dtantsur> yeah, this ^^ is something I'm not quite sure about. I understand we want to avoid unprovisioning an active node right away, but why keep nodes in available?
15:48:28 <rloo> wondering if it is sufficient to put in a mechanism to say 'do not make available'. but if we did that, it would not necessarily mean it is for retirement.
15:48:52 <arne_wiebalck> maintenance is very close
15:48:58 <arne_wiebalck> but has the cleaning issue
15:49:03 <rloo> wasn't there a case where someone might want to do firmware update after an instance is removed, so they don't want the cleaned node to go to available right away?
15:49:14 <rloo> (and firmware update isn't part of the cleaning)
15:49:29 <rloo> ie, they may want the node to go to mgt after cleaning, not available.
15:50:20 <rloo> do we need a 'retirement' state, or do we want a mechanism to move instance-deleted nodes from cleaning to some non-avail state like manage?
15:51:24 <arne_wiebalck> rloo: we don't necessarily need to change the state of the node I think
15:52:16 <rloo> arne_wiebalck: so for you, you're good if you could indicate 'when deleting instance, do cleaning and go to manage (instead of available)', and then indicate via eg extra specs or something 'this is due for retirement'?
15:52:53 <rloo> i am not against a 'retire*' state or something, just trying to understand/generalize.
15:53:03 <arne_wiebalck> rloo: if it is easy extract the list of nodes in this state: yes
15:53:11 <arne_wiebalck> rloo: sounds good :)
15:53:27 <dtantsur> I wonder if this solution generalizes to the case discussed on the PTG
15:53:40 <arne_wiebalck> what way discussed at the ptg?
15:53:44 <rloo> now i'd like some more operator feedback (wrt explicit 'retire*' something). and dtantsur, yeah.
15:53:46 <arne_wiebalck> s/way/was/
15:53:55 <dtantsur> "We detected that this node is broken; do not kill it right away, but mark as faulty"
15:54:10 <arne_wiebalck> sounds very close
15:54:40 <arne_wiebalck> rloo' suggestion seems to cover both, no?
15:54:46 <rpittau> although faulty sounds much more close to maintenance
15:54:47 <rloo> arne_wiebalck: i think so :)
15:55:38 <arne_wiebalck> question still is whether this should be a state
15:55:47 <rloo> rpittau: true. i think the idea was that 'faulty' was not quite the same as 'maintenance' and we shouldn't overload/use 'maintenance' for non-maintenance things. of course, we haven't really defined what 'maintenance' really means :)
15:56:12 <dtantsur> arne_wiebalck: it depends on how you want the existing API consumers to behave wrt retirement
15:56:16 <rloo> arne_wiebalck: right, that's what i wanted to hear from you. do you need/want these nodes-to-be-retired, to end up in some new state?
15:56:21 <dtantsur> (not only on that, but it's a big part of the question)
15:56:31 <dtantsur> a new state will force all tooling to handle it explicitly
15:56:42 <dtantsur> a new flag on a node will be ignored by all consumers we don't update explicitly
15:57:20 <rpittau> I totally agree with the non-abuse of maintenance, but I see 2 new states here, if we go down that road
15:57:38 <arne_wiebalck> dtantsur: for the consumers, I thought ironic gives the nodes rather than nova fetching the nodes, no?
15:57:44 <rloo> rpittau: i have nothing against new states -- as long as they make sense :)
15:57:48 <dtantsur> rpittau: maintenance is ruled out because it will prevent cleaning from working
15:58:03 <dtantsur> arne_wiebalck: nova fetches nodes with some filters
15:59:16 <arne_wiebalck> dtantsur: ah, ok
15:59:29 <arne_wiebalck> dtantsur: I thought ironic hides them from nova
15:59:34 <arne_wiebalck> the ones in maintenance
16:00:10 <dtantsur> I think it's done on the nova side
16:00:27 <dtantsur> arne_wiebalck: https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L886-L894
16:00:30 <arne_wiebalck> ok ... that makes things more complicated ... in any case :)
16:00:41 <dtantsur> we're a bit out of time, let's move the discussion back to the spec?
16:00:59 <dtantsur> we need to cover existing API consumers impact, allocation API impact, etc
16:01:05 <dtantsur> I also left one comment about the API design
16:01:08 <dtantsur> thanks all!
16:01:15 <arne_wiebalck> ok, please comment on the spec then, thanks!
16:01:30 <dtantsur> #endmeeting