15:00:16 #startmeeting ironic 15:00:16 Meeting started Mon Aug 22 15:00:16 2022 UTC and is due to finish in 60 minutes. The chair is iurygregory. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:16 The meeting name has been set to 'ironic' 15:00:30 Hello everyone, welcome to our weekly meeting 15:00:34 o/ 15:00:39 o/ 15:00:40 o/ 15:00:43 o/ 15:00:52 o/ 15:00:52 o/ 15:01:11 o/ 15:01:14 you can find the agenda for the meeting in the wiki 15:01:17 #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting 15:01:18 o/ 15:01:29 #topic Announcements / Reminder 15:01:31 o/ 15:02:43 o/ 15:02:52 the first announcement is a sad one 15:03:13 #info Ilya Etingof Passed Away. Goodbye, etingof! 15:03:21 #link https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030062.html 15:03:38 Sorry to hear that, my condolences. 15:04:05 I would like to share with our community, some of us knew him a lot 15:04:07 o/ 15:05:45 OH no, I'm so sorry. 15:07:03 I am very saddened to learn of etingof's passing. My condolences to the ironic community, Red Hat, and Ilya's family and friends. 15:07:03 #info This week we will release our non-client libraries 15:07:55 that's sushy also? 15:08:05 ajya, correct 15:08:06 sushy, ironic-lib, metalsmith 15:08:10 yeah 15:08:26 we need to check for outstanding patches (I have one, has some comments) 15:08:36 so we will focus on reviewing this 3 to make sure we have included what we want in Zed =) 15:08:48 dtantsur, yeah 15:09:00 can I get 2nd reviewer for this https://review.opendev.org/c/openstack/sushy/+/850899 and include that in release? 15:09:27 ajya, sure we will try to include the open patches =) 15:09:27 I'll look; I haven't traditionally worked much on sushy but should proabbly ramp it up. 15:09:33 thanks 15:09:52 #info Antelope PTG etherpad 15:10:07 #link https://etherpad.opendev.org/p/ironic-antelope-ptg 15:10:21 just a reminder that our etherpad for the PTG is this one =) 15:11:39 #info PTG registration 15:11:49 #link https://openinfra-ptg.eventbrite.com/ 15:11:56 don't forget to register for the PTG 15:12:33 Please register for the PTG so the foundation knows how many attendees plan to actively engage. This allows them to have information for future planning as well, so everyone attending registering would help them a lot. 15:13:34 #info ironic-ui is fixed =) 15:13:51 * TheJulia suspects we all need to dance now 15:14:28 yay! 15:14:31 I don't know the irc handle of Vishal, tks for the help! 15:14:39 #link https://review.opendev.org/c/openstack/ironic-ui/+/852702 15:15:21 no action items from previous meeting, skipping 15:15:31 #topic Review subteam status reports 15:15:40 #link https://etherpad.openstack.org/p/IronicWhiteBoard 15:15:59 starting around L90 15:18:25 Are there even meaningful updates there to review? 15:18:43 I guess the one w/r/t anaconda 15:18:46 Anaconda CI 15:18:53 yup =) 15:19:32 Uhh... so... tl;dr is I cannot use opendev's mirror system without hacking in another feature (maybe) into the interface to explicitly delineate package repositories versus all the other artifacts 15:19:55 =( 15:20:11 in essence, the mirror can't take on a more stuff without there being an increasingly negative impact, and the guidance is to just use public mirrors for folks doing Rocky linux 15:20:23 Have we looked at if we can make the install lighter/faster in any way to help get past timeouts? 15:20:23 so.. I *think* the net effect is I just need to get the timing right 15:20:39 I'm mainly curious if we can pass flags to anaconda, disable some of the setup steps to get a thinner test 15:20:40 it is fairly minimal, downloads can just take a ton of time 15:20:48 but I haven't looked at a breakdown of what it's spending most of the time in 15:20:59 except that one were we saw it was taking ~5 minutes for all the packages 15:21:00 possibly, although again, I'm thinking we're talking borderline featurey things 15:21:09 maybe those are okay since they would just be jinja2 parsing 15:21:11 but really ... 5 minutes is small in context of an hour+ job 15:21:42 yeah, latest run got to configuring the kernel post-install at 1hr. I think we're also having the lack of paravirt kill us too 15:21:56 since it is a lot of CPU overhead on uncompressing 15:22:14 I'm going to push forward, already working to make it it's own single job 15:22:25 just wanted folks to generally be aware 15:22:40 if you want to pair on this at any point, or just get a second set of eyes, feel free to ping me and/or we can even set aside some time 15:22:51 The "evil" option in my brain is just look to see if anaconda checked in with ironic, and then abort the deployu 15:22:56 which could be valid too... 15:23:04 Maybe we should discuss that instead 15:23:07 that's exactly the kind of short circuiting I'm looking for 15:23:26 Ultimately, it all comes down to the template and it's contents, it is hyper customizable 15:23:28 ironic does all the orchestration up front; do we want the CI to "catch" anaconda breakages, or just Ironic's ability to set the table for anaconda? 15:24:02 It wouldn't catch the close out of the deploy... but that should be fairly clear if it breaks based upon reports/issues 15:24:16 but starting/template processing/data handling, it would catch any issues there 15:24:33 we could even intentionally have a deploy break and return an error to ironic 15:24:41 testing the unhappy path is arguably more important than the happy one 15:24:56 depending on what, yeah 15:25:07 get anaconda going far enough for it to be able to do the err callback to ironic; then we know at least we setup anaconda as expected 15:25:08 Anyway, I could use opinions here, I'm a bit tired of fighting this :) 15:25:21 yeah, use me as the help for that, I'm somewhat guilty for helping upstream that sans-CI 15:25:25 That might be a really good separate test, fwiw 15:25:43 I am proposing it potentially as the only test, as we could probably skip all the package installs 15:25:44 do one that aborts, do one that errors, call it a day 15:25:53 then you get something working sooner, then worry about the happy/aborted path 15:25:59 oh, error would get called before package installs I believe 15:26:04 that's what I'm saying 15:26:25 I think you just need to feed it invalid stuffs and it calls %onerror 15:26:34 what exactly are we trying to test wrt anaconda interface? 15:26:45 minimum test I guess... 15:26:55 Heh. You could go an even step further. Write test code in anaconda template, run it, use the returned err as an indication that things were setup as expected (or not) 15:26:58 I'd like to know end to end it works 15:27:12 *but* there is a lot of overhead to make it happy in our CI 15:27:13 end-to-end == actually install an OS image? 15:27:24 well, in this case, install from a repository 15:27:30 rloo: yeah TheJulia has it working with an actual install, it just times out in the last few % 15:27:42 like last 10% it looks like 15:27:52 using the support added for repo-based deploys instead of liveimg based deploys 15:27:55 but it is quite variable based upon the mirrors 15:28:41 gad. hmm... and it needs everything from the mirrors? 15:29:11 and where does the time out come from? can we increase it? 15:29:43 not *everything* only like 320 rpms 15:29:50 infra cannot carry the stage2/install image 15:29:56 so... no local mirrors 15:29:57 only... 15:30:03 we should move on 15:30:06 and continue in open discussion 15:30:24 I was about to say this =) 15:30:36 #topic Deciding on priorities for the coming week 15:30:43 #link https://review.opendev.org/q/status:open+hashtag:ironic-week-prio 15:31:32 I'm working on trying to get several Nova-Ironic driver patches backported, probably wouldn't hurt to get more Ironic +1s on them if folks want to add them to their review list as well. I'm tracking it here: https://etherpad.opendev.org/p/NovaPatchesFromJay 15:31:48 (I don't think we can put weekly-prio tag on nova patches) 15:32:16 Can you please help to add this patch to week priority list : https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/853621 15:32:23 I think we can hashtag it... 15:32:42 TheJulia, normally only the owner of the patch can do that if I recall 15:32:50 ahh, yeah! 15:32:52 also depends on the config for the project 15:33:18 for example https://review.opendev.org/c/openstack/nova/+/813897 has the hashtag 15:33:55 so JayF you can probably try to add the hashtag (I think it should work...) 15:34:31 kamlesh6808c, added 15:34:39 thanks ! 15:34:56 I'm adding dtantsur's patch https://review.opendev.org/c/openstack/sushy/+/851023 also 15:35:43 iurygregory, all: I updated the hashtag on those nova stable patches owned by me (many are owned by others and I'm just playing frontman to get them merged lol) 15:35:54 JayF, no worries! 15:35:58 tks! 15:36:29 not sure if Eric Lei is around to update https://review.opendev.org/c/openstack/ironic-lib/+/844666 15:36:48 I'll push an edit later today so we can merge =) 15:38:20 metalsmith doesn't seem to have patches we would need to review 15:38:52 moving on o/ 15:38:55 #topic Baremetal SIG 15:38:56 given the time, I think one of us should jsut make the change 15:39:18 NTR for the SIG 15:39:27 tks arne_wiebalck =) 15:39:45 #topic RFE review 15:40:00 I'm a bit puzzled if the topic from open discussion would be rfe review... =) 15:40:26 yep, could be 15:40:48 #info Discussion of the software RAID story 15:40:55 #link https://storyboard.openstack.org/#!/story/2010233 15:41:21 kubajj, o/ 15:41:29 kubajj has been working on extending the disk protection to s/w RAID devices 15:41:55 one question we ran into is what to do with create_configuration 15:42:10 i.e. when the devices are re-created 15:42:18 Can I ask a question a step behind that? 15:42:27 sure 15:42:27 Sure 15:42:31 Why do we need the ability to explicitly skip disks that hold RAID partitions 15:42:45 if the operator already has the (thanks to kubajj) ability to skip disks based on device hints? 15:43:10 Because RAIDs are skipped by default anyway. They are handled in a different function 15:43:22 Ah, and you just want to add software raids to those that are skipped. 15:43:49 We have to be careful how we implement this to prevent a malicious actor from putting something that looks like a raid superblock on a disk to prevent being cleaning 15:43:50 Yeah, the goal is just to extend the functionality. 15:43:51 **cleaned 15:44:56 JayF: everything but the partitions which form the RAID are cleaned 15:45:09 well, almost everything :-D 15:45:34 Yeah, I'm not saying we shouldn't do it, I'm saying we should be careful and make sure there's an opt-out for anyone with a higher security bar 15:46:13 JayF: sure, unless you explicit say on the node that you would like to skip sth, all will be cleaned 15:46:24 like before 15:46:29 awesome 15:46:40 this is about the special case where you have multiple s/w RAID devices 15:46:48 I apologize, some of this stuff, I don't know what happened when I wasn't looking so I appreciate you filling in the context 15:46:50 and you would like to skip cleaning *some* 15:47:22 exactly, the plan is to use the volume_name as mentioned in the story and include it in the skip_block_devices list in the properties section like with normal disks 15:48:00 the volume name is the name you give the device itself (not the block device file), it is md device metadata 15:49:02 https://review.opendev.org/c/openstack/ironic-python-agent/+/853182 enables actually creating logical disks with volume name enabled 15:49:17 I have tested it out on our testing node and it works 15:49:21 and the question was if there is an obvious problem with this ... I think dtantsur mentioned the inspector as one potential source of problems 15:50:23 otherwise we go ahead and see where it gets us :) 15:52:13 I liked the idea, just trying to understand the inspector problem ... 15:52:51 the inspector adds the root device to the inspector data (I think) 15:53:11 Are we not worried at all about the ability for whoever got that device provisioned to them being able to change that volume name? 15:53:25 Like, if that's not a case we're worried about; awesome... but it's trivial for that volume name to change 15:54:13 I was going to suggest PARTUUID but pretty much any unique identifier is changable from the system :( 15:54:18 * TheJulia wonders if we're scope creeping to cover all possibilities as opposed to trying to cover 90% 15:54:27 if you change the volume name, the next cleaning would erase your data 15:54:30 TheJulia: that's why I asked if we were worried about it :D 15:54:36 (not to say everything is good!, but obviously we need to start somewhere) 15:54:37 arne_wiebalck: of course, so it fails safe 15:54:50 aight, sounds like fun :) I look forward to reviewing it 15:54:55 heh 15:55:10 ok, we can check with dtantsur directly once more, seems he is not here atm 15:55:37 I'm kinda here, not following the discussion tho 15:55:50 I have a small item for open discussion, if we've talked this one through 15:56:06 JayF, go ahead 15:56:25 So, most of you know I've been working a new job, with 20% time dedicated generally to openstack and 80% to the other project I work on, Armada 15:56:33 iurygregory: the problem with inspector is that it does not have access to any symlinks on the node 15:56:35 over the next few weeks, those percentages will be swapping and you'll be seeing more of me around 15:57:07 I'm going to focus, until I get into a good cycle of my own work, on upstreaming some stuff from downstream here, stable maintenance, and reviews 15:57:19 but feel free to nerd snipe me in for bigger stuff as I will have the time to give 15:57:23 JayF, nice 15:57:59 dtantsur: I fail to see how that is a problem 15:58:21 the discussion was around resolving /dev/md/ 15:58:25 I know we don't have a formal role for it anymore, but generally I am going to try to be the grand-poo-bah of stable branches, getting stuff backported (and starting with heralding our long-outstanding patches to our nova collegues) 15:58:28 I still haven't read the full scrollback 15:58:37 dtantsur: we can do this tomorrow or so 15:58:49 JayF: I feel bad asking someone to look at our bug list.. but someone needs to look at our bug list 15:58:50 That /dev/md/ is going away in newer mdadm aiui 15:59:02 when you said volume name, I assumed that was shorthand for the partition label 15:59:02 cool! 15:59:32 dtantsur: when I get ramped up, i'll specifically cut aside some time to work on bugs 15:59:37 JayF: is it hiding in under the lvm interface? 15:59:38 JayF: no, what we mean is the name of the md device (which is sth you can set on an md device) 15:59:50 arne_wiebalck: that is going away in newer mdadm/kernel combinations 15:59:58 arne_wiebalck: people were complaining about it being gone in gentoo channels the other day 16:00:04 arne_wiebalck: without /dev/md/, volume name seems useless 16:00:04 oh noes 16:00:12 JayF: you mean the --name option? 16:00:21 dtantsur: why? 16:00:28 arne_wiebalck: how else do you access it? 16:00:31 kubajj: I *think* so? I'd have to look in depth to remember exactly 16:00:37 mdadm --detail 16:00:51 arne_wiebalck: okay, here is the trick: how are you going to do it on the inspector side? :) 16:00:57 assuming you want it as part of the root device hints 16:01:08 dtantsur: why do I need to do it on the inspector side? 16:01:14 arne_wiebalck: because we already do 16:01:16 we are at the top of the hour for the meeting , going to close and we can continue talking :D 16:01:22 we parse and process root device hints in inspector 16:01:24 #endmeeting