Monday, 2023-08-21

opendevreviewVanou Ishii proposed openstack/sushy master: Fix missing ETag when patching Redfish resource  https://review.opendev.org/c/openstack/sushy/+/89211302:28
*** dmellado81918134 is now known as dmellado819181304:48
rpittaugood morning ironic! o/06:40
kubajjGood morning Ironic07:29
opendevreviewDmitry Tantsur proposed openstack/bifrost master: Remove Fedora from the CI  https://review.opendev.org/c/openstack/bifrost/+/89212307:46
dtantsurFYI folks ^^^07:46
rpittau:(08:14
arne_wiebalckGood morning, Ironic!08:25
rpittauhey arne_wiebalck :)08:31
rpittauhey kubajj :)08:31
arne_wiebalckhey rpittau o/08:31
arne_wiebalckrpittau: do you think it recheck'ing my patch is worth a try? otherwise I am a little lost how to tackle Zuul's lack of approval ... cookies worked in the past :thinking08:32
arne_wiebalckrpittau: do you think it recheck'ing my patch is worth a try? otherwise I am a little lost how to tackle Zuul's lack of approval ... cookies worked in the past :thinking:08:32
arne_wiebalckrpittau: https://review.opendev.org/c/openstack/ironic-python-agent/+/89160908:33
rpittauarne_wiebalck: let's try with another recheck for now08:42
rpittaubtw my CS9 patch for metalsmith passed, so moving to CS9 soon08:43
arne_wiebalckrpittau: thanks!09:31
opendevreviewMahnoor Asghar proposed openstack/ironic master: Add inspection (processing) hooks  https://review.opendev.org/c/openstack/ironic/+/88755411:00
iurygregorygood morning Ironic11:40
TheJuliagood morning13:13
iurygregorygood morning TheJulia 13:14
arne_wiebalckrpittau: same metalsmith jobs failing after a recheck ... I had another look, but do not see the connection to my patch ... bdist_wheel complaints and dropped connections are the errors I see in the logs14:01
arne_wiebalckhey iurygregory and TheJulia o/14:01
iurygregoryarne_wiebalck, o/14:01
rpittauarne_wiebalck: I think the problem is related to this message 'mount: can't setup loop device: No such device or address' but not sure why that's happening14:10
rpittaualthough I think we saw that in the past14:10
TheJuliadtantsur: is that going to be backported or just a new change to remove CI jobs in older branches?14:11
arne_wiebalckrpittau: thanks for checking once more14:12
arne_wiebalckrpittau: I only searced for error, not failure :)14:12
arne_wiebalckrpittau: do other jobs suffer from this, or only my patch?14:12
rpittauit's all patches AFAICS14:12
dtantsurTheJulia: the fedora one? I'm afraid it will since nodepool changes affect all branches.14:14
dtantsurSince the job is non-voting, we can of course just ignore it..14:14
TheJuliawell, we ought to at least backport changes to remove it from our jobs/config/etc14:15
dtantsurto be clear, the change does not remove any actual functionality, so if someone has it working, it will keep working for now14:15
TheJuliaokay, then just our jobs/config for CI then14:17
arne_wiebalckrpittau: thanks ... and "phew" :-D14:18
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508714:21
iurygregoryfyi I won't be able to join our weekly meeting today14:30
rpittauarne_wiebalck: I think I found the issue, need to do one more tests14:30
arne_wiebalckrpittau: oh, rly? that would be great ofc ... what do you suspect as the cause?14:31
opendevreviewRiccardo Pittau proposed openstack/metalsmith master: Use jammy nodes to run CI jobs  https://review.opendev.org/c/openstack/metalsmith/+/89214614:32
rpittauwellllll... this ^ :D14:32
arne_wiebalckrpittau: :-D14:33
rpittauwe're still running on focal there and I'm afraid there could be some issues with the latest tinycore build we're using14:34
rpittauor just focal nodes not working well in general14:34
arne_wiebalck+1 ... will retry once this one is merged14:35
TheJuliafile under semi-crazy idea: https://bugs.launchpad.net/ironic/+bug/203238014:39
dtantsurTheJulia: we may get some interest in that soon (will pm you a link)14:40
dtantsurTheJulia: I'm +2 to the whole proposal, except that I'm rather +0 on the "nota bene"14:41
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: [DNM] test ci metalsmith integration jobs  https://review.opendev.org/c/openstack/ironic-python-agent/+/89214714:41
rpittauarne_wiebalck: if you keep an eye on this ^ should tell if it works14:42
opendevreviewMahnoor Asghar proposed openstack/ironic master: Add inspection (processing) hooks  https://review.opendev.org/c/openstack/ironic/+/88755414:43
arne_wiebalckrpittau: +114:43
arne_wiebalckrpittau: thanks!14:43
opendevreviewMahnoor Asghar proposed openstack/ironic master: Add inspection (processing) hooks  https://review.opendev.org/c/openstack/ironic/+/88755414:45
TheJuliadtantsur: it is just a note, nothing more14:45
JayFTheJulia: my only response is surprise we don't already do that14:47
JayFwhich usually means it's a good suggestion :D14:47
TheJuliawe needed the abilility to send a url ?which? I think lines up to 2019-ish timeframe14:47
TheJuliaI haven't looked at the dmtf schema rev dates recently though14:48
dtantsuryeah, it was a later addition14:48
TheJuliaand 2020.... was... 202014:49
dtantsurdark times, we're not talking about them here14:52
JayF2020, you mean that old ABC news show, right?14:55
JayFthere is no 2020 other than that I know of 14:55
JayF:D 14:55
JayF#startmeeting ironic15:00
opendevmeetMeeting started Mon Aug 21 15:00:40 2023 UTC and is due to finish in 60 minutes.  The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'ironic'15:00
TheJuliao/15:00
kubajjo/15:00
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508715:01
JayFGood morning Ironic'ers! 15:01
JayFA reminder we operate under the OpenInfra Foundation CoC https://openinfra.dev/legal/code-of-conduct15:01
JayF#topic Announcements/Reminders15:01
dtantsuro/15:01
JayF#note Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash15:01
JayFI'm also going to note that15:01
JayF#note Bobcat non-client library freeze is Thursday, Aug 2415:01
JayFFinally, one about the PTG15:02
JayF#note PTG is virtual and taking place October 23-27 202315:02
JayF#link https://etherpad.opendev.org/p/ironic-ptg-october-202315:02
JayFplease use the etherpad to chat about topics of interest for the etherpad15:02
JayFAny comments/questions on the announcements, or anything to add?15:03
* dtantsur has nothing15:04
JayFI'm going to skip the next item; we do not have action items from the last meeting to follow up on.15:04
JayF#topic Review Ironic CI Status 15:04
JayFWe have a couple of CI-related items on the agenda I wanna let folks know about before we get into general status15:04
JayFfrickler brought it to our attention in IRC Friday afternoon that Ironic is one of the projects left with the most zuul config errors15:05
TheJuliaso, apparently our power just dropped, I don't know how much longer we'll be on15:05
JayFthis is basically when CI is so broken that zuul can't even read the config (usually it means we haven't had any patches pass testing since the zuul queue change months maybe year+ ago)15:06
JayF#link https://od42.de/ironic15:06
JayFthat link doesn't work for me, but if I use the filter manually it's obvious we have old bugfix and stable branches plagued by the issue15:06
rpittauo/ (man I'm late)15:07
JayFheck, it looks like python-ironicclient gates as recent as yoga are impacted15:07
JayFdtantsur: TheJulia: Is it one of your teams that use bugfix/ branches downstream?15:07
JayFThat's one of the big pieces of info I'm missing: if these are bugfix branches which ones can we nuke15:07
dtantsurJayF: we used to rely on them; no longer I think15:08
TheJuliaMine does not15:08
dtantsurnot even for ancient releases, right rpittau?15:08
JayFAck, let me take this action then15:08
TheJuliaI *believe* there was a list made ~1 year ago which enumerated a ton of branches that could be dropped15:08
rpittauJayF dtantsur: not anymore no15:08
JayF#action JayF to audit zuul-config-errors, propose retirement of clearly-abandonded branches and try to fix broken ones 15:08
fricklerJayF: seem I need to do some URL quoting on that redirect :-/15:08
rpittauwe plan to use bugfix for metal3 upstream but only the latest 1-215:08
JayFfrickler: yeah, it happens but it's obvious what's broken :)15:08
dtantsurokay, so as far as OCP is concerned, we're fine with going back to short-lived bugfix branches15:09
rpittauyep15:09
dtantsurmetal3 - as rpittau said (thanks!)15:09
rpittaubtw I need to propose new bugfix branches this week :D15:09
JayFWould someone who isn't me mind htiting the list with a "bugfix branch update" saying some of this15:09
JayFwith a proposal for how long they should live, etc?15:09
JayFI don't wanna just guess and rugpull, but it's obvious right now we keep 'em up for too long15:09
rpittauI thought we've updated already our docs ?15:09
dtantsurWe can simply get back to what I proposed in the spec back then15:09
dtantsurand yeah, the docs15:09
JayFack15:10
fricklerJayF: fixed15:10
JayFto summarize my understanding of the existing docs: bugfix branches are retired when their letter'd counterpart go out15:10
JayFfrickler: oooh very nice15:10
JayFlol stable/pike NGS15:10
rpittauthe bugfix branches last for at most 6 months15:10
JayFthat screams "jay didn't retire this with the rest" :( 15:11
rpittauthen they can get pulverized15:11
JayFack; that works for me15:11
JayFSo the other CI-related item we have is15:11
dtantsuryeah, 6 months matches my recollection15:11
JayFAfter the chat in IRC last week about janders's change and not getting tested on real hardware15:11
JayFI reached out to the HPE team, they claim to have fixed HPE Third Party CI.15:11
JayFhttps://review.opendev.org/c/openstack/ironic/+/889750 one of their examples, has a run on it15:12
JayF#note HPE Third Party CI is functioning again.15:12
dtantsur\o/15:12
rpittaunice15:12
JayFIs Is there anything generically about CI we need to speak about?15:13
JayFI think other than the endless sqlite locking battles we've been cleaner than usual?15:13
rpittauthe metalsmith src jobs are busted at the moment15:13
rpittauthis impacts ipa CI15:13
JayFack15:14
rpittauI think it depends on the fact that they still use focal15:14
rpittauso proposed https://review.opendev.org/c/openstack/metalsmith/+/89214615:14
JayFthat makes sense to me, and is a forced migration anyway15:14
JayFwe shouldn't release "B" with it on focal anyway15:14
rpittauyep15:14
JayFThanks for looking into that.15:15
JayFAre there any other outstanding CI items?15:15
rpittaualso CS9 jobs in metalsmith -> https://review.opendev.org/c/openstack/metalsmith/+/86937415:15
JayFlanded that one just now15:15
rpittaugreat, thanks!15:15
JayFDo we have a userbase for metalsmith?15:15
JayFI feel like I only ever hear about it when CI is broken15:15
JayFI assume CERN since arne_wiebalck has some activity on it?15:16
arne_wiebalcknope15:16
rpittaummm not sure, maybe TheJulia or hjensas know ?15:16
arne_wiebalckI just ran into it since zuul is not happy with my raid rebuild patch15:16
JayFInteresting. OK. Maybe I'll add that to PTG topics.15:16
TheJuliasorry what might I know?15:16
JayFTheJulia: if we have any known users of metalsmith15:17
TheJuliaJust RHOSP15:17
TheJuliaafaik15:17
dtantsurWhen I created it, I hoped that people start using it just in general, as a handy CLI15:17
dtantsurMaybe I was naive, and we need something equal in ironicclient (and the backing API)...15:17
JayFRHOSP is a pretty big user of it then :D15:17
JayFdtantsur: it's possible for all those things to be true at the same time :D15:18
JayFdtantsur: metalsmith can blaze a trail ,we can use that to figure out how to make it work in primary clients/apis15:18
dtantsurthe planned but never implemented Deployment API was the next logical step for me15:18
JayFthat's good to know though, I just want someone to have a use case for it, RHOSP totally counts15:18
JayFdtantsur: maybe toss that on PTG topics and we can res it? 15:18
JayFnobody is goign to make time to do it if we don't talk about it and hype it up15:19
TheJuliaI added Metalsmith to the ptg topic list15:19
JayFI can be your hype man dtantsur 15:19
dtantsurI don't see a point in that. We've had these discussions over and over again.15:19
dtantsurheh15:19
dtantsurUntil someone has a vested interest, it just does not happen...15:19
TheJuliaProblem is, at least in my circles, it gets viewed as this "alternative to ironic" or "replacement of ironic"15:19
TheJuliaand people don't really grok that it is just a client15:19
dtantsurlol15:19
TheJuliayeah15:19
JayFMaybe the answer from PTG is gonna be to make better docs out of it :)15:20
JayFI was talking to kubajj this morning about how doing non-nova Ironic deploys is not very intuitive15:20
TheJuliaI think the real isuse is tons of people don't know how to actually *use* ironic15:20
dtantsurOr decide how we can decompose metalsmith into smaller bits and gradually merge15:20
JayFand AFAICT we lack a directive doc on how to do it exactly15:20
TheJuliaeven though there are videos, pages, everything else15:20
JayFTheJulia: YES15:20
dtantsurOH YEAH15:20
dtantsurinstance_info anyone?15:20
TheJuliaalmost like we need a class15:20
dtantsurWho does not like JSON fields without schema validation?15:20
TheJuliamost people I talk to don't even think along those lines, it is a big scary thing they just don't understand in general15:21
dtantsurIf the most basic thing Ironic is doing needs to be taught... we're losing :(15:21
JayFWell, we don't always like to frame Ironic this way15:21
JayFbut Ironic deployments are super easy to do... if you have nova in front15:21
TheJuliait might just be resistance to information because they have no need to touch it because that is not their primary role15:21
dtantsurBy the way, the outreachy season is coming. If we have an easy win, we can try proposing it.15:22
JayFthis is a secondary use case and we've always treated it as a secondary use case, whether that's right/wrong/etc15:22
JayFdtantsur: I will have an MLH intern around that season too15:23
JayFdtantsur: but we'd need rough docs to be able to get an intern to curate it into not-terrible docs15:23
dtantsurWe have rough docs, no?15:23
JayFdtantsur: I'll note: incoming interns is also why I'm working on contributor guide updates now (probably Tues you'll see a post with a radical improvement on our ironic-in-devstack docs)15:23
JayFdtantsur: if so I couldn't find them in 10 minutes while working /kuba15:23
JayF*w/kuba15:23
dtantsuris https://docs.openstack.org/ironic/latest/user/index.html what you mean?15:24
dtantsurparticularly, https://docs.openstack.org/ironic/latest/user/deploy.html15:24
JayFthat is exactly the doc I was looking for earlier15:24
JayFdtantsur+++++15:24
JayFkubajj: ^^ fyi I'll also slack the link to you15:24
dtantsurI've spent quite some time on this document, but I'm sure it can be improved much further15:25
dtantsurespecially the configdrive explanation is lacking15:25
JayFyeah I got feedback about this stuff being confusing from a lost-potential-user the other day too15:25
kubajjJayF: I was reading this in the morning but got some error, thought it might be just bifrost15:25
* TheJulia wonders what is the attention span window we should be focusing on15:25
TheJulia"what is documentation in the tiktok generation" might be another way of framing that mental musing15:26
dtantsurheh15:27
rpittaulol15:27
JayFI think that is based on a flawed premise: we're not doing a good job of making docs discoverable for the altavista generation either ;)15:27
dtantsurWe have a lot of vague concepts. Like instance_info itself.15:27
rpittaushould we do a "bare metal deployment dance" ?15:27
JayFwe have a large number of docs, it's borderline impossible to know which one you need15:27
JayFand the vague concepts like dtantsur points out makes it hard to know what to search for15:27
dtantsurWell, dunno. I think "User Guide" is a pretty natural place to look15:28
* JayF was convinced at SCALE20x by a librarian that we need one15:28
dtantsurI'd be more worried that people run away screaming after reading the Installation Guide :D15:28
JayFI'm going to add a note at PTG about this, maybe we can take a swing at it or at least think about it in the intervening time15:29
JayFwell Julia beat me to it, but it's on that doc :D 15:29
JayFmoving on15:30
JayF#topic Review ongoing 2023.2 workstreams15:30
TheJuliadoc topic added to ptg etherpad15:30
JayF#link https://etherpad.opendev.org/p/IronicWorkstreams2023.215:30
JayFIt's too early to fully declare victory15:31
JayFbut this has been a crazy productive cycle it seems15:31
JayFso much impactful stuff landing and in progress15:31
TheJuliaWhere are we at on the nova side of shards key usage?15:32
JayFtesting and positive feedback to johnthetubaguy15:33
JayFthen I think he goes begging for reviews15:33
JayFI was struggling with devstack I got it working, so I should have an env to test that in this week15:33
JayFTheJulia: unless you have time and want to dedicate time to it, let me commit to doing that test on Tues15:34
JayFTheJulia: then we can free that up for John and hopefully he lands it15:34
JayFAny other questions/comments/discussions on in-progress work streams?15:35
TheJuliaI can re-review, I think the last time I looked at the code I had high confidence in it15:35
JayFsame15:35
JayFI just want to actually test it15:35
TheJuliaSince it is all well walked pattern changes15:36
TheJulialets sync after the meeting on it15:36
JayFack, going to move on15:36
JayFNothing listed for RFE Review; skipping that section.15:36
JayF#topic Open Discussion15:36
JayFI had one item for here:15:36
JayFPTL and TC nominations are open. I strongly encourage Ironic contributors to run for PTL and/or TC. If you're interested in being PTL talk to me. 15:37
JayFIf nobody else has self-nominated for PTL by midweek, I will re-nominate myself for a third term.15:37
JayFThat's all I had, just wanted to draw attention there.15:38
JayFAnything else for open discussion?15:38
dtantsurDemocracy is good, letting Jay to get a break is even better!15:38
dtantsurGo people go!15:38
dtantsurSo, yes, one funny bug15:38
dtantsurhttps://bugs.launchpad.net/ironic/+bug/2032377 was brought to my attention by my fellow operator15:38
JayFEh, I don't mind being PTL tbh; I just appreciate that we cycle leadership and don't wanna break tradition :)15:38
dtantsurit's stupidly simple, but I have no idea how to work around it cleanly15:38
JayFwe can't really leaving cloud-init AND glean installed on the same IPA image15:39
JayFthat's the bug there, yeah?15:39
dtantsurnope15:39
TheJuliait *sounds* like the the image has a pre-baked config drive15:40
TheJuliaand we don't find it15:40
dtantsurImagine we're cleaning a node that had a configdrive. And IPA has a configdrive.15:40
TheJuliaand we create a new one15:40
TheJuliaand *boom*15:40
TheJuliaoh15:40
JayFdtantsur: ah, in IPA world we should never ever not ever respect config on disk15:40
JayFthat is potentially a security bug15:40
dtantsurRight. But we do.15:40
TheJuliaso it is when the ramdisk boots, it finds/attaches the config drive data embedded in the iso15:41
JayFwe'd almost need glean to have an option to filter block devices and/or look for a different label15:41
TheJuliaand doesn't unmount it for operations it sounds like?15:41
dtantsurTheJulia: still simpler. Glean is looking for a configdrive. There are two: the one it should use (in the CD) and the old one on disk.15:42
TheJuliathe one on the disk shouldn't be a block device...15:42
TheJuliayet15:42
TheJuliawut15:42
JayFfor cleaning15:42
JayFyeah?15:42
dtantsurTheJulia: how can it be NOT a block device?15:42
TheJuliait would need to be attached to the loopback to become a device15:42
dtantsurJayF: cleaning is the biggest problem; after the cleaning the rogue partition will be gone.15:42
JayFor on a not-cleaned device doing a second deploy15:42
TheJuliawe need to look at the simple-init code and if we can get a ramdisk log that would be super helpful15:43
dtantsurTheJulia: you're talking about a file; configdrive is a partition (on disk) or a whole device (CD)15:43
TheJuliaOH15:43
TheJuliathe whole CD is labled config-215:43
TheJuliawut15:43
dtantsuryep, that's how DHCP-less works in Ironic15:43
JayFthat is how ISO-based configdrives work in VM15:43
JayFas well15:43
TheJuliaI didn't realize that was how vmedia ramdisk worked15:44
dtantsurTheJulia: not always, only when Node.network_data is used15:44
TheJuliaOH15:44
TheJuliawheeeeeeeeeeeeee15:45
JayFYeah, I agree that's a nasty bug.15:46
JayFI agree I don't know how we fix it without changes in glean.15:46
JayFAND service steps will increase the scope of the bug15:46
dtantsuryeah. maybe we should talk to ianw, I can drop him an email15:47
TheJuliaI think we don't have enough information to fully comprehend the bug since if they ahve a pre-existing configuration drive, and one based on the image itself, it is sort of a case we never expected15:47
TheJulia++15:47
JayFeven like glean-use-part-uuid=AAAA-BBBB-CCCC-DDDD 15:48
TheJuliasince we expected the node to be cleaned, but this is an instance as a ramdisk with it's own config drive15:48
JayFin the kernel command line15:48
dtantsurTheJulia: we do actually. IPA has a configdrive because it's how DHCP-less works. The disk has a configdrive because it's cleaning after deployment.15:48
JayFjust some way for Ironic to signal to glean "use this one"15:48
JayFyeah that's why the bug is tricky; both configdrives are valid just not in the same context15:48
TheJuliayeah, but we know it at that point, they are doing it themselves outside of cleaning, which is how I'm reading the bug15:48
dtantsurmmm?15:49
TheJuliahmm15:49
TheJuliawe really need to talk with the filer of the bug and ask questions15:49
dtantsurI talk to him all the time but I don't know which questions you have in mind15:50
dtantsurWe know what is going on, we don't know how to fix it15:50
TheJuliaWell, I'm confused on what exactly they are doing15:50
dtantsur1) Normal deployment with DHCP-less IPA; 2) Instance tear down; 3) Boom15:50
dtantsur3) *Boom with DHCP-less cleaning15:51
JayFConfigure node.network_data. Deploy the node. Clean the node. At clean time both the original deployed configdrive AND the IPA-node.network_data configdrive exists.15:51
TheJuliaso, our ipa ramdisk should have the smarts to know where to boot from, and not hunt to the OS disks when booted that way15:51
TheJulia*should* being the keyword there15:52
JayFyep, that's what dtantsur and I were talking about with glean15:52
JayFb/c we'll need to tell glean exactly which partition to help it dedupe15:52
TheJulia... or just explicitly run it15:52
dtantsurfor contrast, that's the current glean's logic: https://opendev.org/opendev/glean/src/branch/master/glean/init/glean-early.sh#L34-L3815:52
opendevreviewMerged openstack/bifrost master: Remove Fedora from the CI  https://review.opendev.org/c/openstack/bifrost/+/89212315:52
TheJuliayeah, that is semi-problematic15:53
dtantsurTheJulia: running Glean manually is non-trivial since it gets triggered from udev15:53
TheJuliayeah15:53
JayFI mean, couldn't we do something like:15:54
JayF1) Disable glean from autorun, via udev and everything else15:54
JayF2) On IPA startup, look for ipa-network-data=blah and if it exists, do some mounting then run LN 53 from glean-early.sh?15:55
TheJuliaglean uses integrated network interface enumeration15:55
JayFso you can't run it late?15:55
TheJulianot really15:55
JayFyeah, then we need glean to get some sorta hint 15:55
JayFthat says "no really, this configdrive"15:55
dtantsurYou probably can, but it's going to happen quite late15:55
dtantsure.g. currently IPA is After:network-online15:55
JayFgood point15:55
JayFwe'd basically have to write a separate unit, at which point we've reinvented the wheel15:56
dtantsurWe could have our own service that goes Before15:56
dtantsurright15:56
* JayF votes for kernel cli or on-disk glean config that points it explicitly at a partition uuid15:56
JayFsince we should know the partition uuid at create time, yeah?15:56
dtantsurokay, lemme talk to Ian, maybe he has an opinion too15:56
dtantsurJayF: we can use /dev/sr0 really..15:57
JayFdtantsur: that is going to potentially vary based on hardware15:57
dtantsuralso true15:57
JayFdtantsur: which is why I'd prefer a uuid-based approach15:57
JayFAight, we have 3 minutes left15:57
JayFany items remain for open discussion?15:57
kubajjI have a quick question regarding the docs. Is there any prefered location I should describe the hierarchy of kernel/ramdisk parameters? I did not find the current state described anywhere.15:57
clarkbyou can manually invoke the glean script with the right network info15:58
clarkbthats all the udev systemd integration does15:58
JayFkubajj:  doc/source/install/configure-glance-images.rst15:58
JayFkubajj: from a cursory running of rg deploy_ramdisk doc/15:59
clarkbyou could also potentially use fancier udev rules to do what you want. udev is magic but also indecipherable15:59
dtantsurclarkb: possibly, but then we need to stop using simple-init15:59
dtantsur.. which may not be a terrible thing because then we can include whatever we invent in all ramdisks (currently it's opt-in)15:59
* JayF hears the bells ring for the top of the hour15:59
JayF#endmeeting15:59
opendevmeetMeeting ended Mon Aug 21 15:59:54 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:59
opendevmeetMinutes:        https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-21-15.00.html15:59
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-21-15.00.txt15:59
opendevmeetLog:            https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-21-15.00.log.html15:59
JayFWe can keep chatting obviously but no need to log our solutioning :D 16:00
JayFfyi; I'm out of the office (well, at least out off-and-on) for the remainder of the day; if you need me for something send a message and I'll get around to it16:00
* TheJulia goes and looks for her voltmeter to check her electrical after this hurricane16:04
rpittaugood night! o/16:04
dtantsurclarkb: do you think it would be acceptable for glean to optionally read the configdrive source from kernel params?16:05
dtantsura label or a UUID, dunno yet16:05
clarkbcurrently glean looks for the right device labels. I could see making that configurable I guess16:06
clarkbprobably a good idea to cross check with cloud-init to see if they do something similar yet and mimic that if so16:07
dtantsuryep, makes sense16:07
dtantsurclarkb: https://cloudinit.readthedocs.io/en/latest/reference/datasources/nocloud.html seems to be quite relevant16:11
dtantsuror https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v1.html#network-config-v116:13
clarkbI don't think we want anything quote so flexibile in glean. It very specifically does config drive and that is it16:13
clarkbI think we would reject anything like that. config drive is the system. If we need to use different labels for coordination purposes that is probably fine though16:14
dtantsuryeah, it's probably the easiest and the least invasive way.16:14
clarkbit is interesting that cloud-init uses a different hard coded label though16:14
dtantsurclarkb: it's for NoCloud. For ConfigDrive, they support the same labels.16:15
clarkbrather than accepting a config option for that. I guess using the local fs may be too late though hence the kernel boot param?16:15
clarkbright16:15
dtantsurclarkb: the local fs is a bit problematic for us because we cannot affect it in runtime so easily in Ironic16:15
clarkbaha. In that case checking boot params for a configdrivelabel attribute (or similar) seems reasonable. If not present we default to the current name(s) else use the supplied value16:16
dtantsuryep16:16
opendevreviewJakub Jelinek proposed openstack/ironic master: Introduce default kernel/ramdisks by arch  https://review.opendev.org/c/openstack/ironic/+/89081916:20
JayFCan we, instead of label, specifically use partition uuid?16:23
JayFI guess we could use a uuid as a label, but it just seems like we do not want that to be a static string that is guessable by the tenant on the instance16:23
clarkbJayF: config drive currenclt explicitly uses labels. I'd like to stick to config drive as much as possible here16:23
clarkbI mean you can check uuids but config drive isn't built that way16:23
clarkbso its extra code to write and test in a difficult to test and debug part of these systems16:24
JayFThis is explicitly a case of disambiguation though, this would be two config drives, both validly labeled, and indicating which one is the one to use16:24
clarkbright, maybe take that up with nova ?16:24
JayFBut either way, my only meaningful concern is that we're able to use a nonpredictable string as label or UUID or whatever16:24
dtantsurJayF, clarkb, that's what I wrote on the bug: https://bugs.launchpad.net/ironic/+bug/2032377/comments/1. Makes sense?16:24
JayFNah this one is entirely our bug 🫥16:25
JayFUsing config drive for IPA while there's also a config drive on the instance is not so fun16:25
clarkbwell nova says a config drive is a specific label16:25
opendevreviewMerged openstack/metalsmith master: Add centos9 based job  https://review.opendev.org/c/openstack/metalsmith/+/86937416:25
clarkbwhich is ambiguous if two fses have the same label16:25
dtantsurclarkb: note that one of the configdrives is on the service ramdisk ISO16:25
JayFdtantsur: that looks good16:25
dtantsurthe final instance does not see it16:25
clarkbI just selfishly want to avoid writing code that has to deal with uuids and labels becaues debugging glean is super painful16:26
dtantsurclarkb: it's not too bad, just instead of LABEL=... here https://opendev.org/opendev/glean/src/branch/master/glean/init/glean-early.sh#L44 you'd have UUID=16:27
dtantsurin the proposal, we're going to pass it for you even16:27
TheJuliawheee looks like one of our solar arrays is down16:28
clarkbI guess that isn't too bad since we lean on blkid. May still create debugging paths that are different but unlikely as long as system tools are reliable16:28
clarkbmostly I want to be careful we don't diverge from what a config drive is too much16:29
clarkbcurrently it is defined as a hardcoded label on a fs. Making that configurable isn't too much of a stretch16:29
JayFOne positive thing to note, is that ironic relies heavily on those system tools being reliable already so we should know if something happens that would likely break glean16:29
clarkbbasically glean is a highly opinionated config drive only instance configurator. If you need fancy features cloud-init is where yo ushould look16:30
clarkbI think "please use device label=foobar" isn't too much of a stretch here. And probably use this uuid isn't either given how straightforward that appears to be16:30
clarkbTheJulia: hopefully flooding hasn't affected you too much?16:30
TheJuliawell, a train just derailed at the bridge nearby16:31
TheJuliaso.......16:31
TheJuliayeah16:31
dtantsurOo16:31
clarkboof16:31
TheJuliayeah, the highway became a literal river16:31
clarkbya I saw a river is running over I1016:31
TheJuliaI'm not far from that actually16:32
dtantsurThere have been quite a bit of water in Moscow as well: https://t.me/ostorozhno_novosti/18847 https://t.me/sotaproject/6476016:34
TheJuliagood news, and bad news16:42
TheJuliaGood: OVN DHCP w/v4 works \o/ Bad: Downloads through OVN stall out: https://d1260bf3e4063ee5a28c-2b5476c4783b3d94c184e1ce73ec8a2b.ssl.cf5.rackcdn.com/885087/48/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool/037d878/controller/logs/ironic-bm-logs/node-0_console_2023-08-21-16%3A02%3A49_log.txt16:42
opendevreviewMerged openstack/ironic master: Retool sqlite retries  https://review.opendev.org/c/openstack/ironic/+/89133316:43
hjensasTheJulia: That is with https://review.opendev.org/c/openstack/ironic/+/885087 ?18:41
hjensasOn OVN topic, I talked to the neutron folks today, and agreed to have a go at testing https://review.opendev.org/c/openstack/neutron/+/890683 (DHCPv6)w/OVN.18:45
TheJuliahjensas: yes, got a pcap and trying to figure out what exactly is going on. I've sort of got a hunch, but I'm trying to understand why I have like 16kB packets18:49
TheJuliain my pcap18:49
hjensasok, I am going to use your patch locally - start with v4 and then switch it to v6 and add that neutron patch.18:53
hjensasTheJulia: let me know if you figure out why it's can't download the kernel/ramdisk. I'll dig as well if I see the same issue locally.18:55
TheJuliapcap file https://usercontent.irccloud-cdn.com/file/b4v3L20Y/tap-node-0i1-a.pcap18:55
TheJuliaSo it seems to be ovn, and eventually I loose a packet somehow18:56
TheJulia... I wonder if ovn is just overflowing a buffer18:56
TheJuliahjensas: changed out the networking to libvirt bridge against brbm ... i.e. not vepa mode, and can reproduce the exact same behavior where the conneciton just stalls out22:19
TheJuliait looks like we might be loosing packets somewhere between the br-ex bridge and the actual port22:35

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!