Monday, 2023-10-16

opendevreviewTaketani Ryo proposed openstack/ironic master: Add the setting of memcached servers to keystone_authtoken  https://review.opendev.org/c/openstack/ironic/+/89818300:27
rpittaugood morning ironic! o/07:11
masgharGood morning!08:10
dtantsurmorning folks, happy Monday08:29
opendevreviewPierre Riteau proposed openstack/bifrost stable/victoria: CI: Update cached cirros image to 0.5.3  https://review.opendev.org/c/openstack/bifrost/+/88588009:45
iurygregorygood morning Ironic11:10
iurygregoryyay habemus Firmware Interface in gophercloud :D12:05
rpittauGreat :)12:18
iurygregoryyeah, now I think i just need 2 things to have in metal3 :D 12:34
dtantsuriurygregory: FYI https://github.com/gophercloud/gophercloud/pull/279112:39
iurygregoryinteresting12:40
*** drannou_ is now known as drannou13:18
TheJuliagood morning13:25
iurygregorygood morning TheJulia =)13:25
opendevreviewDamien RANNOU proposed openstack/ironic-python-agent-builder master: When creating the rescue user, check if we are on Debian or RH based in order to use the right sudo group  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/89832214:27
opendevreviewDamien RANNOU proposed openstack/ironic-python-agent-builder master: When creating the rescue user, check if we are on Debian or RH based in order to use the right sudo group  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/89832214:48
JayFo/15:00
JayF#startmeeting ironic15:00
opendevmeetMeeting started Mon Oct 16 15:00:13 2023 UTC and is due to finish in 60 minutes.  The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'ironic'15:00
iurygregoryo/15:00
dtantsuro/15:00
TheJuliao/15:00
JayFWelcome to the Ironic meeting! A reminder that this meeting is held under the OpenInfra Code of Conduct available at https://openinfra.dev/legal/code-of-conduct.15:00
JayF#topic  Announcements / Reminder 15:00
JayF#info      Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash15:00
JayF#topic Action items from previous meeting15:01
JayFI need to carry this over15:01
JayF#action JayF to backport ngs_save fix for networking_generic_swtich, cut a bugfix-version (not branch) release of it15:02
JayFsorry about that, it fell off my radar15:02
JayF#topic Caracal Release schedule15:02
JayF#link https://releases.openstack.org/caracal/schedule.html15:02
JayFTake note, we have a release schedule.15:02
JayFAny related commentary or discussion?15:02
TheJulianothing from me15:03
rpittauo/15:03
JayF#topic October PTG15:03
JayF#info Topics/schedule have been aligned for PTG, please review15:03
JayF#link https://etherpad.opendev.org/p/ironic-ptg-october-202315:03
JayFAs always, there is some flexibility, please review and make noise if any proposed thing is ahardship.15:04
JayFAny related commentary or discussion on PTG?15:04
TheJulianothing on my end15:05
JayFmoving on.15:05
JayF#topic Ironic CI Status15:05
JayFAnything of note about Ironic CI?15:05
TheJuliaLast week the cirros mirror went offline, it was down for ~4-6 hours, I think on Thursday. Rechecks may be needed but I may have already done them on the open changes from that time window15:06
* TheJulia doesn't remember anymore15:06
JayFAight. In general it's the quiet time so  I'm not surprised the only report is a hard-break.15:06
JayF#topic RFE Review15:06
JayFOne topic here15:06
JayF#link https://review.opendev.org/c/openstack/ironic-specs/+/89647415:06
JayFhttpboot support15:06
JayFPlease take note of the spec and review it15:07
JayFI will add it to my queue but probably will not be a +2 on it unless absolutely neccessary due to the lack of personal experience with redfish gear15:07
TheJuliaIf anyone has questions, please feel free to ping me15:07
TheJuliaAlso, FWIW, we have an ilo version which basically does the exact same thing as prior art, it is not mentioned, but it uses the httpboot bmc interface15:08
JayFoh, neat15:08
JayFI don't have access to ilo hardware for my work, fwiw, either, even though my downstream uses it15:08
JayFThanks for proposing that spec. 15:08
JayF#topic Open Discussion.15:08
JayFI'm going to note, this is a bit of a plug but we've got 50 minutes, I feel OK doing it :P 15:09
JayFI'll be presenting at SeaGL on Nov 4, on Trust in an Open Source Community https://osem.seagl.org/conferences/seagl2023/program/proposals/98415:09
JayFI believe it'll be simulcast or rebroadcast digitally for those not in the area, if you're interested.15:09
iurygregorytks for sharing JayF =)15:10
rpittaunice15:10
JayFAnything else for open discussion?15:11
TheJuliaDo we have anything we need to discuss in advance of the ptg next week?15:11
TheJuliaJust thinking, it is next week15:11
JayFI was hoping folks would look at the etherpad after the meeting and maybe that would induce conversation as needed15:11
JayFyou and I went over it sync last week to get it scheduled15:12
JayFso I think mostly action lies on others now to do prework :)15:12
JayFI have more TC-PTG prework to do, too15:12
TheJuliaIt is also the week before the PTG, most of us need a little mental downtime plus time for administrative tasks15:13
TheJuliaso, mileage will vary this week.15:13
JayF++15:13
drannouI have one if you want: Don't know if you remumber but I ask few weeks ago if you already work on Disk encryption. Seems that it was not the case, so we move on checking how we could integrate SED disks encryption with ironic, barbican etc. We will try to make a POC15:13
JayFI personally have had a lot of pulls on that cord as well the last two or three weeks15:13
TheJuliadtantsur: so I'm thinking of completely ripping glean out. Any objections?15:13
dtantsurIt's not enough information for me to object or not :)15:13
TheJuliadtantsur: still supporting the case though, just not using external tools/logic to do parsing15:13
JayFdrannou: so, is there hardware-assistance in the encrpytion or what? 15:14
dtantsurIf you suggest to rewrite it ourselves.. I'll ask WHY15:14
JayFdrannou: would be interesting to see a writeup -- mailing list or RFE bug is OK if you're not spec-ready yet, about what you have in mind for on disk/orchestration15:14
TheJuliadtantsur: eh, we don't need to do *everything* it does, just a small portion of stuff15:14
TheJuliafor a very short transient time, turns out to be very little code, really15:15
dtantsurTheJulia: I think pretty much everything it does is networking.15:15
JayFrip out glean in what context?15:15
TheJulianetworking for long term consumption, we're in a ramdisk :)15:15
TheJuliavirtual media boot handling/parsing of config-215:15
dtantsurThe very goal of supporting several different networking backends is quite hard. Let alone testing that in the CI.15:15
JayFack15:15
TheJuliadtantsur: not if we're using the standard interface to make runtime changes15:16
dtantsurTheJulia: there are *at least* two of them15:16
JayFthere's a standard, cross-distro network interface?15:16
dtantsurNM and systemd-network15:16
TheJuliaiproute2 should be available15:16
JayFmmm15:16
JayFthat won't do everything network-data can specify15:16
JayFe.g. bonding with vlans15:16
TheJuliawe don't configure bonding15:17
drannouJayF: Yes, Drives like NVME support offloaded encryption: the device itself will manage it, on elec power on the disk is encrypted, waiting for the key. The idea would be to boot on IPA and let the IPA unlock the disk, and soft reboot on the disk15:17
dtantsurwe can do it15:17
TheJuliaand vlans was simple15:17
dtantsurAnd using low-level tools behind the NM back is a risky approach15:17
TheJuliaokay, if someone were to do it manually, yes, they could articulate bonding15:17
dtantsurwdym manually? it's a part of network data.15:17
TheJuliaI'm thinking in the fully integrated case, we bind ports individually afaik15:18
TheJuliaso manually would be someone populating network_data and have no neutron15:18
dtantsurWhich is what Steve Hardy is doing with Metal3 nowadays :)15:18
TheJuliaThe alternative is add additional backends to glean so we can test it in CI15:18
TheJulia(tinycore)15:18
dtantsurOr test with a real IPA image15:19
TheJuliacan't due to rax15:19
dtantsurCan we ask the Infra to not schedule us there? I feel like we're going too far to solve RAX issues already...15:19
TheJuliaor we rip out the fallback logic and just risk performance being a bigger issue15:19
JayFthat's what I was about to ask, dtantsur 15:20
clarkbwe have nested virt flavors. They can and do fail we ask people willing to use them to work with the cloud when that happens to figure it out as we are unable to debug for you15:20
clarkbthey are also in limited clouds so may go away15:20
clarkbbut haven't yet15:20
TheJuliaI can always go deal with trying to write a glean backend for tinycore, but.. I dunno since there is fear over nested virt availability. We opt into nested virt where available15:22
JayFdrannou: as long as there's an open standard for it, I don't see why we'd be in opposition to it. That being said; barbican is not a super active openstack project to be blunt, so that's the only piece that makes me nervous is taking a dep there.15:22
dtantsurA glean backend for tinycore is better than rewriting the whole Glean ourselves IMO15:22
TheJuliaIt is not the whole of glean, but if we're super fearful of just making runtime changes, then I can abandon the path I'm on15:23
drannouJayF: Yes I completely agree, but we would just use barbican for what it is : a secured key store15:23
dtantsurTheJulia: 80% of Glean is already too much Glean :)15:23
JayFdrannou: if I was implementing it in Ironic; given the standalone-use-cases of Ironic, and especially that it's deployed in e.g. metal3, I'd probably suggest that key store be made into an interface in Ironic so it can be pluggable15:23
TheJuliaThere was work a few years ago to do on-boot encryption and plug that into a remote keystore15:24
JayFdrannou: but I think it's obvious this is a feature that 'fits' Ironic; just write it up in an RFE and  add it to "RFE Review" on meeting agenda (likely meeting next week is cencelled for PTG)15:24
TheJulia.... CoreOS had it turned on for a short while.15:25
JayFdrannou: if we need more details, at that RFE review we might ask you to write a spec15:25
JayFhmm15:25
TheJuliaI'm trying to remember what it was called15:25
JayFI'm going to close the meeting; we've sorta devolved into general chat at this point15:25
JayF#endmeeting15:26
opendevmeetMeeting ended Mon Oct 16 15:26:26 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:26
opendevmeetMinutes:        https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-10-16-15.00.html15:26
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-10-16-15.00.txt15:26
opendevmeetLog:            https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-10-16-15.00.log.html15:26
clarkbalso I would get away from "solve the rax issues" its not a rax problem15:26
clarkbits a "nested virt is hard" problem and clouds don't want to bother with the pain15:26
JayFdrannou: to be clear; an RFE is just a bug ticket in bugs.launchpad.net/ironic 15:26
clarkbamazon doesn't do nested virt either15:26
TheJuliaclarkb: True, but highly variable performance on rax doesn't help *at all*15:27
clarkbTheJulia: I've said it before and I'll say it again :) we are very likely our own worst enemy (noisy neighbor) in most of these clouds15:27
drannouNever done an RFE before, Yes I can do it but I don't know if ti will fit the rules :p15:27
clarkbin particular I believe that swapping jobs lead to iops being consumed to the detriment of test nodes across the hypervisors15:27
TheJuliaclarkb: I know, and I agree, which is also why we've made efforts to minimize our footprint on rax in particular15:28
drannouclarkb: the biggest problem with nested offload is the security, there was several CVE related to that on intel side15:28
clarkbdrannou: well it also just doesn't work. It crashes regularly when you get a kernel update in one of the three kernels involved15:28
clarkbif you tightly control the kernel in all three layers then you can probably get it working pretty reliably but that isn't the case for public clouds15:29
TheJuliadrannou: so is the idea your hoping for the encrypted glance image functionality ?15:29
drannouit depends of the workload, most of the time "it works", but of course on specific load, it might not be stable enough15:29
JayFfor cloud providers, "most of the time it works" is hell15:30
JayFdrannou: https://bugs.launchpad.net/ironic/+bug/2034953 is an example of an RFE bug; generally speaking more detail is better. Sometimes I can find it helps me, even if it's "heavier" process, to jump to writing a spec which are these: https://review.opendev.org/c/openstack/ironic-specs/+/896474 -- the specs are good to look at for a checklist of stuff to think about even if15:30
drannouJayF: so true...15:30
JayFjust writing an RFE bug15:30
JayFdrannou: I think it's obvious we are all friendly and stuff, so just take a swing and we'll help if it's not in line :D 15:30
drannouTheJulia: well I did not yet check the Glance part, even may be it's not needed: I was more thinking about an ironic driver15:31
drannoulike BMC, reboot etc15:31
JayFyeah the piece I'm a little interested in is that since it sounds like this unencrypt happens in-band15:32
TheJuliaThe conundrum on the glance encrypted image stuffs is largely it is all writen from a "everything is a VM" and the infrastructure is LUKS under the hood15:32
JayFand Ironic gets outta the loop once we get a workload provision15:32
JayFis who does the decrypt once the workload is released15:32
drannouwhen the disk is unlock, it has no impact on the final OS (to be confirmed), so having it activable or not depending on IPA introspection would be better15:32
TheJuliawhich doesn't work on the hardware side of things short of decrypting the entire thing15:32
JayFTheJulia: this sounds more like formalization around the security locking and key management stuff in NVMe protocols15:33
TheJuliaJayF: yeah15:33
JayFTheJulia: I am ... suspicious that the model will mesh with ours but it's hard to talk/think about without having the full picture written down to reason about15:33
TheJuliaJayF: the glance->VM encrypted image stuff?15:34
TheJuliaor NVMe security?15:34
JayFno, the stuff drannou is talking about re: managing NVMe encrpytion keys and locking/unlocking drive15:34
TheJuliaYeah, that, I dunno15:34
drannouJayF: I made a sim^ple test on hardware I have : in rescue, locking the NVME with a pass, so it wipe out the disk15:35
TheJuliaOne of the paths I've seen is an unencrypted intermediate ramdisk which can obtain the key and do the unlock, but somewhere that key has to be housed15:35
drannouI manually unlock, and install a debian on it15:35
JayFTheJulia: that "somewhere" in drannou's proposal is barbican, but yeah15:35
JayFI'm interested to see it end-to-end15:35
TheJuliayeah, much more a customized diskimage-builder thing imho, but we would need to be aware to do the initial needful15:36
drannouafter a hard reboot, the disk is locked, nothing can boot, but after putting back the rescue, manually unlocking the disk, and soft rebooting, the host boot normally15:36
JayFmainly curious how the two cases look compared: on-disk deployment (we don't have netbooting anymore so it'll have to manage it's own boot once deployed)15:36
JayFand deployment itself15:36
JayFbest case is it works with Ironic and the implementation is a deploy template with a step15:36
TheJuliayeah15:36
JayFdrannou: that is ... unlikely to be a good fit with the Ironic model then IMO15:37
JayFif a user was on a flat network, and using this NVMe style security, a locked disk could put that machine at more security risk because it would potentially fallback to pxe15:37
JayFit would almost require using like ... idk ramdisk deploy driver? with no cleaning?15:38
JayFso you just deploy onto it to unlock the data on the disk15:38
TheJuliayou could15:38
TheJuliasort of yeah15:38
TheJuliayou'd need to have an initial deploy but that too could in theory be a step15:38
JayFI think it's the sort of thing that could be hacked into Ironic to make it work, like I mention as a step15:38
drannouIMO theer is two possible approch: use IPA to unlock the device, or create a small IPA like image that we could flash on the boot sector of the NVME (less interesting for me)15:39
TheJuliabut ramdisk doesn't execute agent side steps at all on boot15:39
JayFbut I don't think that kind of failure mode is something that'd make sense as a first class Ironic thing 15:39
JayFUNLESS/UNTIL we support that "devices that can never power off" stuff TheJulia has queued for PTG15:39
JayFbceause this is a case of a "device that we can never intentionally power off" essentially15:39
TheJuliaeh, one panacia might not, but multiple may!15:39
drannouJayF: Yeah exactly, something like: instead of "clean" arguement, ironic say to IPA: unlock @ reboot15:39
JayFthe thing is, IPA only runs during provisioning actions15:40
JayFwell, if we call node servicing a provisioning action15:40
TheJuliawell, we could always drop in a kexec step too15:40
JayFIPA is outta the picture once the node has an OS on it and is puttering along15:40
JayFso you'd have to do something like rescue mode or like I said, wire up a ramdisk deploy as an "unlocker"15:40
TheJuliaand then there could be some sort of path to always send down deployment as setup15:40
TheJuliadunno, that gets a little complex in clustered setups as there are some caveats15:41
JayFpossible with Ironic ✓15:41
drannouThe problem is that IPA only works on Provisionning vlan, so we would need to do something like rescue: when the customer ask for a hard boot, we need to boot the host on provisionning network, boot IPA, unlock the disk, and go back on user network15:41
JayFgood idea to implement upstream as a happy path  ⃠15:41
JayFyep, exactly15:42
JayFbut rescue is a little... delicate15:42
TheJuliaWell, you can do per-node provisioning, but then you also have security risks... less so with something like httpboot, but yeah15:42
JayFbecause it's timing based as it is15:42
JayFit's successful as a feature to help recover otherwise screwed-up nodes, but it's not something I'd want as part of my usual workflow15:43
drannouSo yes it's a complicated subject, that would need to change a lot of things15:44
TheJuliasteps allow for workflows to be defined though15:44
JayFhmmm15:45
TheJuliakind of going back to deploy time raid15:45
JayFthe reason this doesn't mesh with Ironic model15:46
JayFis that we can't model it in our states15:46
JayFif we were to support something like this as a grade-A thing, we'd need to model "locked" as a state, potentially15:46
TheJuliaWe'd almost need a "ASSISTEDACTIVE"15:46
drannouJayF: the 'lock/unlock' disk state in the API ?15:46
JayFwell, what TheJulia is saying is closer15:46
TheJuliameaning we always need to do something to turn the machine on15:47
JayFwe'd need a state either saying "provisioned with OS but inaccessible"15:47
JayFor "provisioned with OS but only accessible because MAGIC"15:47
TheJuliadrannou: for the redfish api?15:47
drannouTheJulia: I was just asking what JayF has in mind, but he already answer :)15:48
TheJuliaahh, okay15:48
TheJuliasome sort of "we need to do a thing" state in the state machine wouldn't be awful, really. Just the caveats I was thinking of in terms of cluster management. You'd basically want tinyipa sized artifacts for inband operations15:48
drannouyeah that's also something I had in ming: how to "know". In fack I was thinking to simply say: if the power state is off, there is no other way to make a nova start or reboot15:49
TheJuliathat is if you can use such and don't need 600MB NIC card firmware15:49
JayFSo to be clear: 15:49
JayF1) The quickest path is the quick and dirty step based approach, either with rescue or with swapping deploy driver to ramdisk deploy to unlock drives15:49
JayF2) this is the longer term "prettier" approach which probably would have to be ordered *after* support for hardware which can never be powered off (which is being discussed at PTG next week)15:50
drannouI fill that, for security topic, there is no "quick and dirty" possibility :D15:50
JayFwell, (1) would be all operator config, not upstream code15:50
JayFyou can do all you want in your config that I never have to see or be responsible for15:50
JayFand believe me, people do LOL15:50
drannouso I will try to make an RFE, may be it will be too simple as a first step, but we might be able to iterate on it15:52
* TheJulia punts the IPA changes to teach to read/just run ip commands to setup networking15:52
JayFgetting the issue filed and on the meeting agenda is the best first start15:53
JayFalthough I'm about to cancel next week's meeting :D 15:53
TheJuliaIt makes sense to do so, tbh (the meeting15:57
JayFwell, I think it might actually overlap with scheduled PTG sessions 16:00
JayFso it's not really optional16:00
JayFjust didn't mention it in the meeting16:00
rpittaugood night! o/16:05
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: WIP/DNM: Get rid of simple-init (but keep glean)  https://review.opendev.org/c/openstack/ironic-python-agent/+/89551921:10
JayFTheJulia: thought: we need a way, other than separate drivers, for hardware-specific cleaning steps in-band, perhaps. Lighter weight. akin to hardware managers? IDK ...21:24
JayFTheJulia: just thinking of stuff like, ilo had a built in config reset which is useful for cleaning; but that's a bad reason for it to have a whole driver vs just redfish + a single ilo specific bit21:24
JayFjust trying to think of new mental models for thi21:24
JayF*this21:24
TheJulia So the challenge is in the ilo case it is exposed as a step on their driver21:26
JayFthink about this in a greenfield world21:26
JayFnot as it is today21:26
JayFthen like, we can try to mush it into the existing mdoel21:26
TheJuliaOh, yeah, sure if user selected and enabled on a single Interface21:26
TheJuliaNeed a conditional model for automatic operation21:27
JayFWell think about in a world where automatic cleaning could be driven by a to-be-designed step template21:27
JayFthen you move away from a world where everything has to be integrated and automatic and magic21:27
JayFand into one where I can opt into pulling things from more varied places21:28
TheJuliaStill conditional at some level21:28
JayFyeah, I guess so21:28
JayFthis just turns rapidly into21:28
JayFmain driver+sidecar21:28
TheJuliaToday, we have a deploy template, and we’re explicitly told what to use21:28
JayFor just redfish-{blah} type drivers21:28
TheJuliaIn that world, we sort of have to guess or be told explicitly21:28
JayFyeah21:28
JayFbecause of the undiscoverability, by nature, of some of these hardware features21:29
TheJuliaYeah, which sort of lands us on hardware driver based sort of defaults too since it is the lowest common denominator to a class of gear21:29
TheJuliaOr some sort of “can I execute, and try to execute” model for some steps21:30
JayFnow, from a flip side perspective21:30
TheJuliaOr both21:30
JayFthese sort of "advanced features"21:30
JayFare unlikely to be something people want to magically want enabled in general21:30
TheJuliaYup21:30
JayFso potentially having the ability to expose them but only work when flipped on explicitly isn't too horrible21:30
TheJuliaYeah, if they are supported. That is another conundrum21:31
JayFI like the idea of node.automated_clean = name_of_step_template somehow enabling this21:31
JayFbut we'd almost need some new kind of nerfed interface implementation21:31
JayFthat only provides steps and has limitations on what it can do21:31
JayFlike hand it a redfish client or something21:31
JayFlike it could never be the whole driver; but it could be a step mix in21:32
JayF(I'm still not to the point of saying this is an idea, or a good idea, I'm just sorta trying to explore the space)21:32
ashinclouds[m]Nothing stops us from mixing today, just nobody has asked and most drivers have kept the value add stuff behind their libraries on nonstandard interfaces. If appropriate delineation existed, it’s not a big deal21:33
JayFah, in the actual hardware the fancy features are gated in the proprietary side of the bmc?21:34
JayFthat craters any potential at all for extracting value here then21:34
TheJuliaJayF: often, yeah by proprietary apis or by license keys22:05
JayFI mean, if that's the case, may those features languish in unsupported hell forever. I don't wanna expend energy making $vendor money at the expense of openness.22:19

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!