15:00:15 #startmeeting ironic 15:00:16 Meeting started Mon Jun 15 15:00:15 2020 UTC and is due to finish in 60 minutes. The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:20 The meeting name has been set to 'ironic' 15:00:21 Good morning everyone! 15:00:21 o/ 15:00:22 o/ 15:00:27 Time for another meeting full of Ironic! 15:00:28 \o 15:00:28 o/ 15:00:31 o/ 15:00:33 o/ 15:00:35 \o/ 15:00:39 o/ 15:00:39 \o 15:00:52 o/ 15:01:03 o/ 15:01:08 Our agenda this week can be found on the wiki 15:01:10 #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting 15:01:30 #topic Announcements / Reminder 15:01:36 Looks like we have two items on the agenda 15:02:11 The first is the priorities for the Victoria cycle are up for review. Please review/comment as soon as reasonably possible. 15:02:13 #link https://review.opendev.org/#/c/720100/ 15:02:13 patch 720100 - ironic-specs - Victoria Cycle Priorities - 8 patch sets 15:02:24 The second item is regarding a meeting for partitioning on Wednesday 15:02:46 Looks like dtantsur is hosting a call at 2 PM UTC on Wednesday 15:02:48 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015422.html 15:03:00 yep, note the s/July/June/ 15:03:05 Does anyone have anything to announce or remind us of? 15:03:19 dtantsur: No time machines available ? :) 15:03:21 two weeks till our first coordinates intermediate release? 15:03:27 * coordinated 15:03:48 or do I recall the dates incorrectly? 15:04:24 Yeah, roughly 15:04:35 Week of June 29th 15:04:36 o/ 15:04:44 Hence why I keep pushing for reviews to occur :) 15:04:56 aha, cool. if so, we may want to start slowing down the new feature stream (not that we have many) and concentrate on the sprint goals 15:05:30 and I need to make bifrost-based upgrades work.. 15:05:31 a good portion of that is basically code review at this point 15:05:32 o/ 15:06:07 dtantsur: if that ends up being sprint 2, I don't think that specifically is the end of the world 15:06:13 \o rajinir 15:06:29 yep. but I like hacking on bifrost =^_^= 15:07:39 i think it is largely going to be input data based with maybe a slightly different scenario test. We'll figure it out :) 15:08:04 Anyway, we had no action items from our last meeting, so I believe since we've not ratified the priorites we can proceed to review priorites? 15:08:46 it would be good to have status updates on the whiteboard 15:08:53 even if there are no formally approved priorities 15:09:07 Seems reasonable 15:09:21 #topic Review subteam status reports 15:09:28 #link https://etherpad.openstack.org/p/IronicWhiteBoard 15:09:39 Starting around line 253 15:10:07 iurygregory: could you update the DHCP-less section? 15:10:14 dtantsur, doing now =) 15:10:17 great :) 15:10:34 arne_wiebalck: question for you on line 284 15:11:22 TheJulia: right, there is the OOM issue and the inspector issue 15:11:30 TheJulia both are scaling issues 15:11:35 arne_wiebalck: yeah, we need to work those out 15:11:56 TheJulia: shall we keep a "scaling" topic or address these individually? 15:12:09 I think we should scale back the topic 15:12:14 For next week :) 15:12:20 👍 15:12:54 arne_wiebalck: looks like you are using non-standalone inspector already 15:13:02 Looks like we just need people to do a final review on sig whitepaper 15:13:31 kaifeng: what is a non-standalone inspector? 15:13:33 arne_wiebalck: did you have a deadline date in mind? 15:13:43 TheJulia: I was thinking end of this week 15:13:54 TheJulia: to give everyone a chance to read once more 15:14:04 TheJulia: and then file it with the foundation 15:14:05 arne_wiebalck: API/conductor split for inspector 15:14:11 TheJulia: or is this too long? 15:14:12 arne_wiebalck: maybe thursday? 15:14:17 Iury Gregory Melo Ferreira proposed openstack/ironic master: Add L3 boot section to the docs https://review.opendev.org/689844 15:14:18 TheJulia: ok 15:14:19 which I guess you needed to have multi-node inspector 15:14:23 arne_wiebalck: just thinking that way you can send it on friday :) 15:14:30 TheJulia: ok 15:14:48 dtantsur: kaifeng : all our controllers run API/conductor/inspector 15:15:07 arne_wiebalck: oh, you don't split API/conductor for inspector? 15:15:11 And their config doesn't fundimentally need the sync process at all 15:15:26 the tl;dr is the inspector sync pounds ironic's API. 15:15:28 dtantsur: excelent! 15:15:34 well, we cannot expect everyone who uses inspector with many nodes to disable the sync 15:15:36 since they don't use it a pxe filter 15:15:54 dtantsur: you mean running API and conductor on different hosts? 15:15:55 it's not about the PXE filter, it's about syncing the node list 15:16:01 dtantsur: agreed, but no filter basically means it is entirely redundant as well. So multiple things that should and can be fixed 15:16:06 arne_wiebalck: as different processes 15:16:30 TheJulia: I remember there was at least one problem with disabling it, cannot fully remember 15:16:35 dtantsur: the only downside i see having a knob is not deleting records from the inspector DB :\ since it will do the lookup if memory serves 15:16:41 anyway, some people may use the PXE filter, so we need to actually fix it 15:16:56 currently inspector doesn't have hashring to split nodes management 15:16:57 * dtantsur ponders attaching an inspector instance to a conductor group 15:17:22 I mean, the biggest problem right now is that all 12 (?) inspector instances try to do the sync for all nodes 15:17:29 with the leader election in place it will be only one 15:17:35 which is 12x less load on ironic already 15:17:43 dtantsur: yes 15:17:56 Lets have this discussion during open discussion 15:17:56 dtantsur: I think this is a scalable solution 15:18:00 ok 15:18:10 yep 15:18:19 Has anyone looked at neutron event processing recently? 15:18:36 hjensas, kaifeng, rpittau ^^^ 15:18:48 haven't got the time, sorry 15:18:53 nope 15:19:28 nope, I want to get back to it. but need to find time. 15:19:29 re: v6, I guess we never clicked the backport button? 15:20:23 * dtantsur dunno 15:20:38 Julia Kreger proposed openstack/ironic stable/ussuri: Add IPv6 ci Job https://review.opendev.org/735614 15:20:40 no, it was not backported yet 15:20:48 #easy 15:21:30 iurygregory: re: grenade, can you elaborate on "related to ngs key" 15:21:37 note that the job seems red on master 15:21:51 TheJulia, yeah going to add a more info there =) 15:22:39 Okay, well seems good then. Everyone ready to proceed to priorities for the week? 15:22:46 I am 15:22:54 let's 15:23:09 #topic Deciding on priorities for the coming week 15:23:17 #link https://etherpad.opendev.org/p/IronicWhiteBoard 15:23:32 Starting at line 116! 15:23:51 Looks like we got a number of items merged last week. Thanks everyone who helped there! 15:24:24 As for things to add, dtantsur do you think it would be good to add your inspector leader election patch for at least initial feedback? 15:24:37 * TheJulia deletes merged items 15:25:17 yep 15:25:17 TheJulia, added =) 15:25:21 TheJulia: can we pkease add https://review.opendev.org/735335 15:25:21 patch 735335 - ironic-python-agent-builder - Disable automatic updates in dnf-based systems - 3 patch sets 15:25:31 rpittau: go ahead :) 15:25:34 ok! 15:26:19 You guys don't need my permission to add items :) 15:26:37 dtantsur: i remember not all backend support leader election, maybe we need a more generic way? 15:26:38 * iurygregory added items without asking =X 15:26:55 kaifeng: well, I'm not aware of a more generic way 15:27:06 anyway, if the backend returns NotImplemented, all nodes run the tasks 15:27:14 TheJulia: oh it's just to see if there's place, based on priorities :) 15:28:09 dtantsur: hmm, better than nothing ;) 15:28:10 I've added a few things too 15:28:20 kaifeng: yeah, at least it does not make things worse :) 15:28:32 memcached and redis backends support leader election, etcd does not 15:28:49 dtantsur why not use amqp to support task schedule 15:29:09 Qianbiao: we're trying to get rid of amqp :) and anyway, I'm not sure it's related 15:29:12 any node which get the message will run the task. other nodes will just ignore 15:29:18 we don't have any entity to schedule the task, it's periodic 15:29:32 is there anything else people see that should be in the list for this week? 15:29:56 ok if amqp will be removed 15:30:06 dtantsur I find you are working on inband deploy steps, i got a question with inband steps: may it support restart bmc during steps? 15:30:09 not sure it will ever be removed, but I'd avoid a new dependency on it 15:30:10 we likely need to look at https://review.opendev.org/#/c/731644/ 15:30:10 patch 731644 - ironic - Fix Redfish handle no continuous override boot src - 4 patch sets 15:30:20 Qianbiao: (let's talk later) it can be a deploy step 15:30:25 ok. 15:30:28 TheJulia: good call 15:30:44 TheJulia, dtantur: Please do :-) 15:31:19 *dtantsur 15:31:28 * dtantsur wonders if next week we should only keep things that are bug fixes and sprint priorities 15:31:48 also https://review.opendev.org/#/c/730366/ 15:31:48 patch 730366 - ironic - Allow node lessee to see node's ports - 1 patch set 15:31:53 dtantsur: Likely 15:32:29 and somebody (me?) needs to talk to the release team on whether we actually need to change the release model 15:32:45 That looks good, I went through some of the items that got no review traffic it looks like 15:33:11 dtantsur: likely and I think we have to if we want to keep our stress level to manageable levels. 15:33:32 TheJulia: the wind is changing, we may be able to keep the model and the stress level 15:33:49 I'll talk to the folks 15:33:52 okay 15:33:57 * TheJulia doesn't want more stress though 15:33:59 * TheJulia wants less stress 15:34:02 * dtantsur wonders if he has ACL to do #action 15:34:03 :) 15:34:05 heh 15:34:11 dtantsur: I can note one if you want 15:34:14 if you need any help let me know dtantsur =) 15:34:26 iurygregory: I will quite likely need help, but not on the talking stage 15:34:28 :) 15:34:38 yeah =) 15:34:42 there'll be an awkward moment when we figure out how the CI for new branches will work 15:34:52 (remove grenade, add more bifrost or whatever) 15:34:59 gotcha 15:35:01 #action dtantsur to go talk to release team about model and stuff 15:35:17 dtantsur: I thought we needed to keep grenade for openstack.... needs 15:35:29 dtantsur: inherently broken out of the gate and then fix is likely okay in my book :) 15:35:34 rpittau: not for intermediate branches which are explicitly targeting standalone usage 15:35:39 since we can't anticipate everything 15:35:40 yep 15:35:51 the problem with grenade is, it expects to know the branches 15:35:54 That may also mean we need to get them to force merge or squash patches into one change set 15:35:59 no, of course, I was thinking about major opesntack releases 15:36:03 if we have stable/17.0, it won't know what to upgrade from 15:36:22 hence the desire to have a bifrost-upgrade CI job 15:36:24 dtantsur: override variables likely needed in that case 15:36:26 ++ 15:36:27 this we can override the config .. 15:36:32 (which will also need to know the branches, but that's solvable) 15:36:43 yeah, all is possible, let's figure it out as we go 15:36:47 Anyway, sounds like everyone is good with the list of priorities 15:37:01 arne_wiebalck: was there anything else besdies the whitepaper which we've already touched upon a few times? 15:37:07 (specifically for the SIG) 15:37:14 I don't think so. 15:37:31 Then in that case, lets proceed to Open Discussion! :) 15:37:55 ++ 15:38:48 #topic Open Discussion 15:39:37 so, scaling of the ironic-inspector? 15:39:59 I think the leader election should do it, no? 15:40:20 if configured properly, likely. 15:40:24 Otherwise, we may need sth to describe how to set up services. 15:40:45 do you use memcached or redis already? 15:40:50 I... still think a global "don't do this" knob may be good in some configurations 15:40:58 agree with ^^^ 15:41:01 Yes, for Cinder. 15:41:04 I just don't think we should call it a fix for this bug 15:41:16 do we have sync interval for this? 15:41:16 arne_wiebalck: okay, so an easy change for you 15:41:21 heh 15:41:31 kaifeng: we don't handle interval==0, we should 15:41:32 dtantsur: oh yeah, I was never thinking it was a bug fix, more a "well, it doesn't really make sense" in this case change 15:41:39 we only see an effect in prod 15:41:49 TheJulia: tss, lemme first lure arne_wiebalck into testing my leader election patch! 15:41:55 due to the scale, but prod is ... prod :) 15:42:02 that's sad, it would be a fast solution if we have 15:42:05 interval==0 is not a bad idea, although it would be similar to the patch I already posted 15:42:08 dtantsur: hehe 15:42:33 dtantsur: but, yeah, true should not be too complicated 15:42:41 TheJulia: yep, I commented on your patch that I'd prefer to reuse interval==0 rather than a new option 15:42:47 dtantsur: I created an internal ticket to try 15:42:50 dtantsur: sounds good to me 15:42:54 (esp. since it's consistent with ironic) 15:43:19 reducing the frequency to 3600 did the trick for us atm 15:43:32 which is still <10 mins efeectively 15:43:35 effectively 15:43:54 and hence probably good enough 15:44:44 dtantsur: don't worry, will still try your patch :-D 15:46:16 Anything else to discuss? 15:46:25 Do we need a "review jam" ? 15:46:58 mmm, maybe? 15:47:30 not sure about a call, but maybe half-day to sit together on IRC discussing the same patches? 15:47:32 Would anyone be opposed to something on say ?thursday? 15:47:41 that could also work 15:48:12 Friday would be easier 15:49:31 I'd prefer a call over IRC only tbh. Clarifying details/context would be quicker. 15:49:51 Yeah, just looking at down stream calendars and thinking that it is a lower key day 15:50:03 We could do a hybrid, both realistically 15:50:19 both sounds good to me 15:50:57 Do we want to schedule like an hour for such? 15:51:03 or even a half hour? 15:51:21 a half hour may be more than enough for a call if we've all had a chance to kind of get ideas/thoughts together 15:51:24 Harald Jensås proposed openstack/ironic master: Switch Ironic to openstacksdk for Neutron https://review.opendev.org/734873 15:52:43 Friday is also a heavy call day for me right now 15:53:04 Thursday would probably be slightly better for me too 15:53:45 would 4PM UTC work that day? 15:54:07 Thursday? good for me 15:54:19 I could also do before 3 PM UTC on thursday 15:54:28 good for me too 15:54:40 I have an API SIG meeting at 4pm, but it's usually silent 15:54:51 3pm is our downstream meeting, no? 15:55:07 dtantsur: yeah, so after migh tbe besst from an interrupt management standpoint 15:55:08 ah, *before* 3pm 15:55:10 * dtantsur cannot read 15:55:32 2pm UTC would be ideal for me, 4pm works too 15:55:37 arne_wiebalck: does 4pm on next thursday work for you? 15:55:47 yes, all fine for me :) 15:56:01 I can also get up for 2pm, I just can't promise full caffination 15:56:14 mmm, right, it's early for you. let's go for 4pm 15:56:25 sounds good 15:56:29 +1 15:56:51 I'll send out an email here in a little bit just for mailing list visibility 15:57:04 Do we have anything to discuss in the next 4 minutes? 15:57:20 Preferably centered around taking over the world using bare metal and bears with drum sticks? 15:57:52 Please re-read the instructions aka the white paper! 15:58:01 * dtantsur still wants a sticker with a bear biting a redfish 15:58:50 dtantsur: https://thumbs.dreamstime.com/b/face-wild-bear-fish-close-up-concept-bears-hunting-salmon-spawning-185089098.jpg 15:59:37 exactly 15:59:49 LOL 16:00:36 * TheJulia thinks we need this sticker 16:00:42 i ust noticed that in ironic-lib we do some blackmagic to allow the command to be executed without root =O 16:01:04 * dtantsur waits for iurygregory to realize ironic is made of black magic :) 16:01:42 I knew that hehe 16:02:16 but I was surprised when I saw that we check the config and override run_as_root hehe 16:02:17 Maybe cast it as more "wizardry" or "incantations" 16:02:20 I guess we can #endmeeting, then discuss the black magic? 16:02:26 lol 16:02:35 Thanks everyone! Have a wonderful week! 16:02:37 #endmeeting