opendevreview | Merged openstack/ironic stable/2023.2: Add missing compatibility between idrac and redfish firmware https://review.opendev.org/c/openstack/ironic/+/902074 | 00:40 |
---|---|---|
opendevreview | Merged openstack/ironic-python-agent master: Fix vmedia network config drive handling https://review.opendev.org/c/openstack/ironic-python-agent/+/895519 | 01:10 |
opendevreview | Merged openstack/ironic-python-agent master: Parse efibootmgr type and details https://review.opendev.org/c/openstack/ironic-python-agent/+/899775 | 01:10 |
opendevreview | Merged openstack/ironic-python-agent stable/2023.2: improve multipathd error handling https://review.opendev.org/c/openstack/ironic-python-agent/+/901488 | 01:10 |
stevebaker[m] | TheJulia: OK, I don't think assuming efibootmgr returning utf-16 holds in all cases https://zuul.opendev.org/t/openstack/build/2fd82ff944d045918d1bea322a3d60ff/log/controller/logs/ironic-bm-logs/node-0_console_2023-11-28-20:49:06_log.txt#3129-3131 | 01:24 |
stevebaker[m] | TheJulia: when I revert that change and set use_standard_locale=True then it works https://zuul.opendev.org/t/openstack/build/7e4ba12e165145d2a8aad98445bf4aa6/log/controller/logs/ironic-bm-logs/node-0_console_2023-11-28-20:39:00_log.txt#3132-3151 | 01:25 |
opendevreview | Merged openstack/networking-generic-switch stable/zed: Fix regression plugging 802.3ad port group https://review.opendev.org/c/openstack/networking-generic-switch/+/900989 | 01:31 |
TheJulia | stevebaker[m]: so.. is the answer just to look at /sys/efi directly then? | 03:03 |
TheJulia | I'm bit sure how we otherwise navigate it, since the records can and are supposed to be UTF16 | 03:04 |
TheJulia | of course, coming from CI there are really no guarantees | 03:04 |
stevebaker[m] | TheJulia: I'm sure they're stored in UTF16, but efibootmgr will surely only ever display in an encoding supported by the console? | 03:19 |
TheJulia | not quite sure, because we got the definitive utf16 chars from a customer case's efibootmgr output | 03:20 |
TheJulia | we might need to go back and hunt it down even though we were able to simulate it in the unit tests | 03:21 |
TheJulia | the frustrating thing here is using standard local means we can't trust the output of it at all for compares | 03:25 |
TheJulia | only way really to make sure would be to reproduce it with a shell and inject one of those fun characters | 03:26 |
rpittau | good morning ironic! o/ | 07:25 |
opendevreview | Mark Goddard proposed openstack/bifrost stable/2023.1: Fix key-order ansible errors https://review.opendev.org/c/openstack/bifrost/+/902040 | 09:24 |
dtantsur | iurygregory: I'm absolutely sure we need to retry any errors 500 from Ironic in IPA | 11:04 |
dtantsur | I cannot remember why we don't do it already | 11:04 |
iurygregory | good morning Ironic | 11:06 |
iurygregory | dtantsur, ok! I will work on the patch for it | 11:06 |
opendevreview | Maryna Savchenko proposed openstack/ironic-python-agent master: Fix referencing to the raid_device var which is not set https://review.opendev.org/c/openstack/ironic-python-agent/+/900324 | 11:38 |
opendevreview | Maryna Savchenko proposed openstack/ironic-python-agent master: Fix referencing to the raid_device var which is not set https://review.opendev.org/c/openstack/ironic-python-agent/+/900324 | 11:40 |
opendevreview | Merged openstack/ironic master: Replace swiftclient usage with openstacksdk https://review.opendev.org/c/openstack/ironic/+/899999 | 11:40 |
opendevreview | Merged openstack/ironic master: Document wsgi_service fix from 16a806f https://review.opendev.org/c/openstack/ironic/+/902115 | 11:40 |
opendevreview | Maryna Savchenko proposed openstack/ironic-python-agent master: Fix referencing to the raid_device var which is not set https://review.opendev.org/c/openstack/ironic-python-agent/+/900324 | 11:40 |
opendevreview | Merged openstack/ironic stable/2023.2: Properly cleanup unix sockets in wsgi_service https://review.opendev.org/c/openstack/ironic/+/902116 | 12:28 |
iurygregory | dtantsur, looking at the request code for ConnectionError, ProxyError is derived from it. so the retry should work, no? https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/inspector.py#L135-L145 | 12:30 |
iurygregory | we should probably add log to _post_to_inspector I think ... | 12:31 |
dtantsur | iurygregory: is it really derived this way? I'd be really surprised if TCP-layer errors were a superclass for HTTP-layer ones. | 13:52 |
dtantsur | yeah, it seems to be the case. fun. | 13:53 |
dtantsur | iurygregory: ah, this breaks it https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/inspector.py#L148 | 13:54 |
dtantsur | we don't raise for HTTP code (there is not raise_for_status) | 13:54 |
* iurygregory switching back to the context for 502 proxy bug | 13:57 | |
iurygregory | so we should just update the if? I think we are retrying it, but we fail anyway and we just raise it https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/inspector.py#L149 https://paste.opendev.org/show/be7I3IOGrGKRX5OfP1z3/ | 14:04 |
opendevreview | Mark Goddard proposed openstack/bifrost stable/2023.1: ansible-lint: Skip key-order[play] https://review.opendev.org/c/openstack/bifrost/+/902150 | 14:35 |
opendevreview | Merged openstack/ironic-lib master: Increase the ESP partition size to 550 MB https://review.opendev.org/c/openstack/ironic-lib/+/900330 | 14:38 |
adam-metal3 | dtantsur,jayF: I would be interested to learn about the dnsmasq/dhcp topic could I join the discussion ? | 15:01 |
JayF | yep\ | 15:01 |
dtantsur | absolutely | 15:01 |
JayF | reminder you owe us follow ups on the https auth support for ipa too ;) | 15:01 |
JayF | ima see if I have your email to forward the invite | 15:02 |
dtantsur | I'll write an email explaining the context in a few minutes | 15:02 |
JayF | my "I have a small headcold from travelling" is morphing into "I think my body is trying to grow another human in my sinuses" so I apologize in advance if I sound like a muppet | 15:02 |
JayF | adam-metal3: I don't have a good email for you for a calendar invite, DM me one? | 15:03 |
dtantsur | JayF: we can delay this call, it's not urgent | 15:03 |
dtantsur | I'd rather not have you suffer on it | 15:04 |
JayF | I'm not suffering right now | 15:04 |
adam-metal3 | JayF: dm sent | 15:04 |
rpittau | JayF, dtantsur, I'd be glad to participate too | 15:05 |
JayF | now your address I have :D | 15:06 |
rpittau | :) | 15:06 |
dtantsur | an annoyingly long email has been sent | 15:22 |
JayF | dtantsur: how is this handled for k8s-native things? | 15:26 |
JayF | dtantsur: do they just do static ip configuration for all containers? | 15:26 |
JayF | I'm wondering if k8s itself has any kind of dhcp service going on | 15:26 |
JayF | dtantsur: ... is there any reason we have to only have one provisioning network? | 15:33 |
JayF | give each dnsmasq instance its own separate network, have Ironic have some awareness of it | 15:34 |
JayF | I guess this is no different than taking like, a /23 and splitting it into 4 logical /25-sized dhcp ranges | 15:35 |
TheJulia | JayF: oh nodes :( But have you asked an AI for "JayF as a muppet?" | 15:35 |
JayF | really this just needs to be static'd all the way through | 15:35 |
dtantsur | JayF: several provisioning networks is a tough ask for many operators | 15:35 |
TheJulia | it is not unheard of, really | 15:35 |
dtantsur | not unheard of, but I'd be worried about making it a requirement | 15:36 |
dtantsur | although, if we only do it for a multi-conductor setup... | 15:36 |
TheJulia | That is fair | 15:36 |
JayF | Well the other option would be | 15:36 |
JayF | try to do something more or less fully static | 15:36 |
TheJulia | you have to put an opinion someplace | 15:36 |
JayF | where you'd need "N" IP addresses for "N" nodes | 15:36 |
dtantsur | which does not cancel out the problem: an Ironic Node may be handled by a conductor on a different network | 15:36 |
JayF | I think I'm still missing context then | 15:36 |
TheJulia | multi-provisioning network just makes it more complex, in that any conductor may become responsible | 15:36 |
JayF | yeah, basically that's what you need | 15:37 |
JayF | you have to get completely rid of conductor locality | 15:37 |
TheJulia | unless each conductor has a unique one which it it's own and they don't share | 15:37 |
TheJulia | but... | 15:37 |
TheJulia | Where is a bar tender, I need tequila | 15:37 |
JayF | you can do that with static dhcp + on-demand-api-built pxe configs | 15:37 |
JayF | I'm trying to think of how to solve the problem without N IPs for N nodes | 15:37 |
JayF | (we did N ips w/N nodes in OnMetal for this kinda setup, where we really didn't care what conductor did what) | 15:37 |
dtantsur | There are many ways to make the whole thing much more complicated :) I'd rather keep it at least as complex as it is now (which does not mean simple) | 15:38 |
JayF | I'd rather us make the software more complex and have the design in the end make sense in both models than us come up with something that sorta is jammed into place and kinda works | 15:38 |
JayF | or at least, understand the path to ^ while hammering the temporary path into place lol | 15:38 |
dtantsur | At some point, the complexity overweights even the best solution | 15:39 |
dtantsur | and in the bare-metal world, we're always close to this point | 15:39 |
JayF | Perhaps; but I'm thinking manage_agent_boot is what we did in OnMetal for this, and it was tech debt for a long time | 15:39 |
JayF | and if we had designed it better then, we might have this problem solved now, and might not have wasted people-days dancing around that old, now-removed ffeature | 15:39 |
dtantsur | I don't quite see how that would solve anything | 15:40 |
JayF | You could 100% do what you want, with metal3, with the onmetal-style manage_agent_boot setup w/static DHCP like we did at onmetal | 15:40 |
JayF | it was just gross and bad :) | 15:40 |
dtantsur | heh | 15:40 |
JayF | I'm saying, if we had instead of hacking something together for the short term then, had solved the problem, we might not be there now | 15:41 |
dtantsur | I already have static DHCP, that's actually the problem | 15:41 |
dtantsur | if it was less static, I would not have the issue of directing a node to the right conductor | 15:41 |
rpittau | TheJulia: hi! if you have a moment today can you please have a look at my answer to your comment in https://review.opendev.org/c/openstack/ironic/+/894918 ?thanks! | 15:41 |
JayF | dtantsur: I think part of my problem is I don't grok why "right conductor" matters in a world where any conductor can provide a valid DHCP config | 15:41 |
JayF | dtantsur: I'm assuming a homogenous conductor group and driver set, is that a bad assumption? | 15:42 |
dtantsur | We're not in that world | 15:42 |
dtantsur | that's really what my problem is :) | 15:42 |
JayF | Did you not propose a feature to do that, as part of this solution? | 15:42 |
TheJulia | rpittau: fair, I'm just thinking explicitly setting new device/media could also be booted from | 15:42 |
TheJulia | rpittau: that seems "adminy" to me..... | 15:42 |
TheJulia | but maybe member is just right as long as the rights match up | 15:42 |
TheJulia | Then again, custom policy exists for a reason, operators who are worried about it can just say "only admins may" | 15:43 |
dtantsur | JayF: this is one way of doing that. But then, we also have the dnsmasq DHCP interface, and it feels like we're developing two approaches to the same thing in parallel.. | 15:43 |
JayF | dtantsur: I guess my mental model of the cleanest way to do this is "make it so that any conductor can handle any query", which would use both of those features | 15:43 |
JayF | dtantsur: if we don't have both, then it gets a lot harder | 15:44 |
rpittau | TheJulia: ok, I see what you mean, but I'm not too worried honestly, just looking at other policies | 15:44 |
dtantsur | JayF: that may be the path we take.. let's wait for the call itself before we draw a conclusion :) | 15:44 |
rpittau | earlier this morning I saw a couple of failures in the metal3 integration job, I rechecked and it seems ok now, but please keep an eye on it | 15:53 |
TheJulia | rpittau: I'm the only one who raised it, so likely on the being more conservative side of access controls side, which aligns with my personality as for risk management, in other words, likely sane to proceed as-is | 15:57 |
rpittau | TheJulia: thanks! :) | 15:58 |
TheJulia | https://ab9c7f011d4a8adf9dae-cec36eea8e90c9127fc5a72b798cfeab.ssl.cf5.rackcdn.com/901182/7/check/ironic-tempest-bfv/b58deaf/controller/logs/ironic-bm-logs/node-2_console_2023-11-28-18%3A53%3A39_log.txt <-- this makes me stupidly happy | 16:02 |
opendevreview | Julia Kreger proposed openstack/ironic master: DNM: CI test for httpboot jobs https://review.opendev.org/c/openstack/ironic/+/901182 | 16:10 |
TheJulia | hmm... downloaded shim was only 2503 bytes. That seems, wrong. | 16:10 |
opendevreview | Julia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Test multiple boot interfaces as part of one CI job https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/902171 | 16:44 |
TheJulia | JayF: ^ | 16:45 |
TheJulia | Anyone else keeping an eye on the httpboot changes, ^ is largely the ideal end state for testing, instead of adding a bunch of new scenario jobs | 16:47 |
iurygregory | I will take a look at it | 16:48 |
TheJulia | iurygregory: everything httpboot through the http network boot variants should be good to review, the only just trying to figure out if the trouble is grub in general or other unrelated sadness in the change after it. | 17:09 |
iurygregory | TheJulia, ok o/ | 17:14 |
TheJulia | Looks like we need to get the bug supervisor fixed on networking-generic-switch | 17:18 |
TheJulia | https://bugs.launchpad.net/networking-generic-switch | 17:18 |
dtantsur | uh oh | 17:22 |
JayF | I'll email. | 17:25 |
rpittau | bye everyone! o/ | 17:25 |
dtantsur | JayF: btw we're considering a metal3 ptg-which-cannot-be-called-ptg, possibly in late January | 17:25 |
JayF | Metal3 Teams Gathering | 17:25 |
JayF | M3TG | 17:25 |
JayF | with the superscript it'll look extra cool | 17:25 |
dtantsur | adam-metal3: ^^^ :D | 17:25 |
adam-metal3 | dtantsur,JayF: done :D | 17:27 |
JayF | dtantsur: TheJulia: email out to all members of that generic-switch-drivers group, asking them to change it over to ironic-drivers | 17:28 |
JayF | oooh. no. It's the semi-annual Metal3 FORGE | 17:30 |
* JayF is going to be thinking of metal3+gathering puns for the next two weeks | 17:30 | |
TheJulia | looks like the manager is set to Vasyl | 17:30 |
JayF | TheJulia: yep, I included him on the email as well, just wanted to make the net as large as possible | 17:31 |
TheJulia | So it is funny, the video on my left monitor right now is of someone forging a Wakizashi | 17:31 |
opendevreview | Verification of a change to openstack/ironic master failed: Fix *_by_arch documentation and un-deprecate the options without it https://review.opendev.org/c/openstack/ironic/+/901958 | 17:41 |
JayF | just going to say | 17:46 |
JayF | http boot is super cool | 17:46 |
JayF | about as exciting of a feature to review as I've done in a while | 17:46 |
JayF | Ooooh, how satisfying. The UEFI HTTP Boot spec *explicitly* indicates that an http boot is performed via a PXE environment | 17:51 |
JayF | so the name stays the same, it's just like, DHCP-PXE vs HTTP-PXE | 17:51 |
JayF | so satisfying | 17:52 |
JayF | rpittau: https://zuul.opendev.org/t/openstack/build/fea192cc73e14d74a985cb47a6b8c205 another metal3 integration failure; it looks like something in our script to setup the config (it's applying upper-constraints to the install of ironic-lib from git and failing) | 17:55 |
dtantsur | thought it has been fixed by https://github.com/metal3-io/ironic-image/pull/434 | 17:57 |
JayF | dtantsur: is there a way for me to see the sha of the image it should be getting? | 18:00 |
JayF | if it's intermittant, I wonder if there's a stale image cached somewhere | 18:00 |
JayF | I have the sha in the logs it has, I just don't know the cncf/2023 way to see what the correct image sha I want is | 18:01 |
JayF | TheJulia: dtantsur: NGS launchpad is fixed | 19:05 |
TheJulia | awesome | 19:45 |
TheJulia | Any chance I can get a review or two on https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/901213 specifically so we can fix the snmp job on the main gate for now | 20:04 |
iurygregory | TheJulia, +2 lgtm | 20:08 |
stevebaker[m] | good morning | 20:24 |
iurygregory | morning stevebaker[m] o/ | 20:33 |
iurygregory | regarding the idea to try to speed up our multipath checking (since we have the crazy scenario with 84 disks with 4 paths each) https://review.opendev.org/c/openstack/ironic-python-agent/+/902012 this is a WIP on how I think we should try to handle things, not sure if makes sense, if anyone can provide feedback I would appreciate o/ | 20:54 |
TheJulia | o/ stevebaker[m] https://meet.google.com/ady-rbqz-uia | 20:59 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be restarting momentarily for a patch update to address a recently observed regression preventing some changes from merging | 21:09 | |
TheJulia | stevebaker[m]: https://www.compart.com/en/unicode/U+00FF <-- all sadness in efi nvram entry handling code due to this character | 21:37 |
TheJulia | iurygregory: two thoughts added | 21:49 |
iurygregory | TheJulia, tks! | 21:50 |
opendevreview | Merged openstack/ironic-tempest-plugin master: Add snmp variant of ramdisk iso boot test https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/901213 | 22:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!