Tuesday, 2023-09-12

ravlewGood morning ironic09:41
ravlewI'm still getting the fedora-latest was not found error on stable/yoga09:43
ravlewis there any ETA for the fix?09:44
opendevreviewMahnoor Asghar proposed openstack/ironic master: Add inspection hooks  https://review.opendev.org/c/openstack/ironic/+/89081710:10
dtantsurTheJulia, JayF, we should really fix the fact that not all fields can be specified on node creation...10:48
dtantsurravlew, someone needs to check and/or fix https://review.opendev.org/c/openstack/bifrost/+/89375410:49
opendevreviewMahnoor Asghar proposed openstack/ironic master: Add inspection hooks  https://review.opendev.org/c/openstack/ironic/+/89353310:51
ravlewI see, thanks dtantsur11:17
TheJuliagood morning12:55
dtantsurmorning TheJulia! safe home?12:55
TheJuliayes, finally got home yesterday evening12:56
dtantsurnice12:58
* TheJulia tries to wake up13:29
JayFdtantsur: maybe we should put that item on the ptg13:44
dtantsurIs it going to be controversial?13:44
JayFWhy would it be?13:45
dtantsurI don't think so either. Then it's rather a Just Do It items than a PTG discussion?13:45
JayFI'm using that ptg board as a place to put any new work streams for next cycle, not just the ones that we might want to talk about13:46
JayFThen when it comes time to actually schedule the gathering will pare it down to topics that have stuff to discuss13:47
dtantsurah cool13:48
TheJuliamgoddard: o/ Are you guys using etcd with networking-generic-switch ?14:25
TheJuliaIt looks like pulling the lock lease is failing, if you have any insight it would likely help JayF - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_be9/888051/2/check/networking-generic-switch-tempest-dlm/be9c8a0/controller/logs/screen-q-svc.txt14:26
TheJuliaJayF: I feel like we could likely mark the dlm job non-voting for now, and just add a release note noting there is an issue with interacting with etcd, and we're still investigating14:27
opendevreviewJulia Kreger proposed openstack/ironic master: Support port name: API  https://review.opendev.org/c/openstack/ironic/+/76556914:34
opendevreviewMerged openstack/ironic master: Fix minor grammar issues in the help for new inspector options  https://review.opendev.org/c/openstack/ironic/+/89013814:45
dtantsurwow, the metal3 job has caught a real breakage on https://review.opendev.org/c/openstack/ironic/+/863999! I wonder why nothing else did...14:47
TheJuliasecure boot, only the grub job would even get close to that and I think it is ipmi14:49
opendevreviewVerification of a change to openstack/ironic master failed: CI: Remove ubuntu focal job  https://review.opendev.org/c/openstack/ironic/+/89401414:49
dtantsurno, it was a regression in the boot mode support14:49
TheJuliaoh, hmm14:49
TheJuliathe explicit API?14:49
dtantsurnope, the normal boot mode stuff14:50
TheJuliaweird14:50
dtantsurvery weird indeed14:50
dtantsursushy-tools reports None, my code is waiting for "bios" in vain14:50
dtantsurI'm not sure it can happen in real hardware, but who knows..14:50
TheJuliawe don't switch any of the jobs dynamically afaik14:51
TheJuliastart state is the expected end state14:51
dtantsuryeah, but start state is None. at least here.14:51
dtantsur'boot': {'allowed_values': ['Pxe', 'Cd', 'Hdd'], 'enabled': <BootSourceOverrideEnabled.CONTINUOUS: 'Continuous'>, 'mode': None, 'target': <BootSource.PXE: 'Pxe'>}14:52
JayFMy Sharding CI job is failing because ... we don't have permission to set shard? https://review.opendev.org/c/openstack/ironic/+/894460/5/devstack/lib/ironic#2587 14:54
JayFhow can that cred have ability to create node and set properties but not shards14:54
JayFdo we put a policy file in devstack jobs?14:54
dtantsurTheJulia, I suspect the boot mode cannot be detected in metal3-dev-env, which snowballs into all sorts of problems14:55
dtantsurI wonder if we should even try to change *anything* if the current mode is None...14:55
TheJuliaJayF: updating shard requires devstack-system-admin based upon current policy14:57
TheJuliafallout from keeping the original "admin is admin everywhere" bug14:58
TheJuliaand that we support both, since TC has wanted to eradicate the system scoped model entirely14:58
JayFIn what world does it make sense that node:update:shard is a higher permission than node:create14:58
TheJulianode:create has logic to permit project admins to create a node, and record the project in owner14:58
TheJuliawe likely just need to loosen the policy there for project scoped admins14:59
TheJuliabut if you look, we restrict other fields14:59
TheJuliafor now, I'd make the change as devstack-system-admin instead of devstack-admin14:59
JayFhttps://github.com/openstack/ironic/blob/master/ironic/common/policy.py#L989 SYSTEM_OR_PROJECT_ADMIN ?14:59
JayFI will try and figure out how to do that14:59
TheJuliaand we can revisit the RBAC policies/restrictions at the PTG based upon changes/evolution14:59
TheJuliayou can do that as well15:00
JayFlike here's the thing15:00
JayFif you can set shard on create, you can make an argument for the current setup15:00
JayFbecause updating shard is a footgun, potentially15:00
JayFyeah I'll try to elevate devstack creds instead, I'm not convinced project admin should have that ability15:00
JayFwe should fix create15:01
TheJuliabut look at yeah, I think that might have been the intent to restrict changing the shard *as much as possible*15:01
TheJuliaso it didn't be come a footgun used15:01
TheJulias/be come/become/15:01
TheJuliayour going to need something like SYSTEM_OR_OWNER_ADMIN15:01
JayFso sounds like there are two follow ups (so far) from this CI attempt: 1) Make node create take shard (and other stuff?) 2) Fix sharding permissions to allow owners to set shard15:02
JayFbut for now, I'm going to elevate the creds to get it passing15:02
JayFsince updating the policy needs to be done in a backportable way15:03
TheJuliaSYSTEM_OR_OWNER_ADMIN = ( '(' + SYSTEM_ADMIN + ') or (' + PROJECT_OWNER_ADMIN + ')' )15:03
TheJuliathat is why discussion is really required, since to change policy on a backport we would have to go  "we got this wrong"15:04
TheJuliaand treat is just as a bugfix15:04
JayFIMO we should *only* change it if we determine it is a bug15:05
TheJuliaagreed15:05
JayFand fixing the client to set shard on create dulls this corner15:05
TheJuliavery much so15:05
TheJuliaalmost entirely, really15:05
TheJuliasince the agreed upon behavior was set and never really ever change15:05
TheJuliabecause "bad things will happen" :)15:05
JayFnow to sift through a few thousands lines of lib/ironic to find how to call with different creds lol15:06
TheJuliait is not unique need in there15:06
TheJuliabaremetal --os-cloud devstack-system-admin continuecommandhere15:06
JayFI figured I'd run across an example if I read enough :)15:07
* TheJulia goes back to slide deck and le castle vania15:07
JayFSpeaking of, I got a talk accepted at SeaGL 2023 the first week in November15:07
TheJuliaNice!15:07
TheJuliaCongrats!15:07
JayF"Trust in Open Source" (mainly focusing on how you have to ensure your incentives are aligned with the projects moreso than anything else)15:08
opendevreviewJay Faulkner proposed openstack/ironic master: [CI] Support for running with shards  https://review.opendev.org/c/openstack/ironic/+/89446015:11
JayFthanks for the insight, I think that should do the trick15:11
JayFI might need to add something based on if we're enforcing scope if we were to move that into more jobs15:11
TheJuliaJayF: I think that is still going to fail15:12
TheJuliadevstack-admin is a project scoped admin, not a system admin15:12
JayFI read it and tried to figure out which it needed and flipped a coin15:15
* JayF changes to -system-15:16
TheJulia:)15:16
opendevreviewJay Faulkner proposed openstack/ironic master: [CI] Support for running with shards  https://review.opendev.org/c/openstack/ironic/+/89446015:16
opendevreviewDmitry Tantsur proposed openstack/ironic master: Redfish: wait for secure boot state change if it's not immediate  https://review.opendev.org/c/openstack/ironic/+/86399916:31
dtantsurJayF, I hope this is much clearer for the operators ^^16:31
JayFI'm a big fan of the guardrails you added16:32
JayF+2 with an optional note16:33
dtantsurThx!16:33
dtantsurMeanwhile, we're still unable to reproduce the database locked issue in the metal3 job after the recent round of patches.16:35
JayFthere's a more celebratory way to say that ;)16:36
JayFwhich will have the side effect of immediately causing a failure if it's not true16:36
JayFLOL16:36
dtantsurthat's why I'm so careful ;)16:36
dtantsurlike apparently they don't say The Q Word in American hospitals :D16:37
TheJuliaQ word?16:39
JayFI don't actually know this Q word16:39
dtantsurTheJulia, "What a quiet day today" - "NOOOOOOOOOOO"16:39
JayFI learned that rule working fast food, not a hospital16:40
JayFlol16:40
* dtantsur has learned recently16:40
JayF> "What a quiet day today"  > [manager runs in frantically] "TWO BUSES!!!!"16:40
TheJuliaOh, American culture is about institutional under-staffing to ensure maximum shareholder benefit16:40
TheJuliabecause, profit!16:40
TheJuliaso there are no quiet days, ever16:40
dtantsurThe under-staffing aspect is unfortunately not unique to the USA (the exact reasons may differ)16:40
TheJuliaJayF: in that case, the manager is the third bus16:43
JayFfast food managers are both instruments and victims of the system in that case, they were under all the busses the whole time :(16:43
TheJuliaIndeed16:44
JayFhttps://review.opendev.org/c/openstack/ironic/+/894460 this is concerning17:33
JayFlooks like everything worked except the part where it didn't work17:33
JayFI'm asking infra to hold one for me17:35
opendevreviewHarald Jensås proposed openstack/ironic master: devstack - configurable ipv6 address mode  https://review.opendev.org/c/openstack/ironic/+/89362218:56
TheJuliaJayF: looks like it gets set, but it also looks like it never polls :\20:10
JayFTheJulia: define: it20:15
JayFshard gets set, but ? never polls20:15
TheJulianova, doesn't appear to actually grab a list of nodes20:15
JayFTheJulia: I have a infra hold out for one of these, so I'll get a look soon20:15
JayFit's very possible I just messed up the config but it looked OK from the output20:16
JayFit's OK though, there's a reason I wanted to get this running during RC period :)20:16
JayFI am not the best at manual QA/QC so I'll not feel confident until this job works20:16
* TheJulia lays down for a little bit20:19
JayFI may also be stepping out a bit early, I'm not feeling great. Trying to make it a bit further. I suspect I'll end up working on prelude today and dedicate morning-brain to sharding ci20:19
TheJuliaYeah, I’m laying down because suddenly not feeling great myself, likely just ripples of exhaustion from the last two days20:20
JayFI'd imagine so20:21
JayFwtf, shard isn't even in the query Sep 12 20:33:34 np0035232105 devstack@ir-api.service[86195]: [pid: 86195|app: 0|req: 203/203] 173.231.255.247 () {68 vars in 1732 bytes} [Tue Sep 12 20:33:34 2023] GET20:35
JayF/baremetal/v1/nodes?fields=uuid%2Cpower_state%2Ctarget_power_state%2Cprovision_state%2Ctarget_provision_state%2Clast_error%2Cmaintenance%2Cproperties%2Cinstance_uuid%2Ctraits%2Cresource_class => generated 1253 bytes in 21 msecs (HTTP/1.1 200) 7 headers in 285 bytes (2 switches on core 0)20:35
JayFand afaict everything is configured properly20:36
JayFI'm starting to wonder if my testing was invalid, I'm not seeing any way this coulda worked20:45
JayFand if it's broken this way, it's because we don't have proper support in openstacksdk for filtering by sharding20:48
JayFand for some reason we use ironicclient throughout the nova driver except to get nodes20:49
JayFI have no idea how the hell I tested this working20:49
JayFhttps://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L797 I am essentially seeing no evidence in logs that this is effective20:53
JayFnever gets added to the query20:54
JayFthat uses openstacksdk, it looks like everything is hooked up thru for shards20:54
JayFI'm very confused20:54
JayFjohnthetubaguy: help ^ tomorrow please20:55
JayFI need to step away, I'm really starting to feel worse rapidly. I'll be in tomorrow morning to tackle this again20:56
JayFI think I figured it out while walking the dog. Testing it now.21:34
JayFyep21:38
opendevreviewJay Faulkner proposed openstack/ironic master: [CI] Support for running with shards  https://review.opendev.org/c/openstack/ironic/+/89446021:44
JayFaight, I think that's the problem21:45
JayFalthough it's clear to me it could've never worked before, so I must have had some weird thing when testing before where like, I was somehow able to provision to the pre-sharded versions of the nodes in nova's cache (?)21:46
JayFI'm honestly not sure how nova works at that level, which probably is why I screwed up the manual testing :/21:46

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!