Wednesday, 2021-06-16

*** hjensas is now known as hjensas|afk00:04
opendevreviewJulia Kreger proposed openstack/ironic master: WIP Scoped RBAC Devstack Plugin support  https://review.opendev.org/c/openstack/ironic/+/77895700:31
opendevreviewJulia Kreger proposed openstack/ironic master: Remove redundant/legacy is_admin logic  https://review.opendev.org/c/openstack/ironic/+/79655200:31
arne_wiebalckGood morning Ironic!05:22
stendulkerGood morning arne_wiebalck !05:23
arne_wiebalckHey stendulker, how are you doing?05:24
stendulkerI'm good. Thank you. How are things at your end?05:25
arne_wiebalckAll well over here as well, things are getting back to normal (slowly) and summer is coming :)05:26
stendulkerThats nice. Cases are also decreasing here and we probably should open-up by month end.05:52
arne_wiebalckThat's great to hear!06:15
iurygregorygood morning arne_wiebalck stendulker and Ironic o/06:30
stendulkergood morning iurygregory !06:31
arne_wiebalckHey iurygregory o/06:31
iurygregoryo/06:31
* arne_wiebalck wonders if iurygregory is an alligator now06:31
iurygregoryarne_wiebalck, not yet =( I think I need the 2nd dose =)06:32
arne_wiebalckiurygregory: ok, will check then again :-D06:32
iurygregoryarne_wiebalck, yeah :D probably a few weeks after 24/07 :D06:32
jandersgood morning arne_wiebalck stendulker iurygregory and Ironic o/07:11
arne_wiebalckhey janders o/07:11
stendulkergood morning janders 07:11
iurygregoryhey janders o/07:13
*** rpittau|afk is now known as rpittau07:15
rpittaugood morning ironic! o/07:15
iurygregorymorning rpittau o/07:18
rpittauhey iurygregory :)07:18
opendevreviewAija Jauntēva proposed openstack/ironic master: Redfish: Skip non-RAID controllers for storage  https://review.opendev.org/c/openstack/ironic/+/79659207:19
ajyaGood morning Ironic, a while back I mentioned issue with mysql 8.0.19+ where duplicate key error was not correctly parsed with mysql changes and broke inspection (https://storyboard.openstack.org/#!/story/2008901)07:41
ajyaduplicate key error parsing is now fixed in oslo.db 9.0.1, what would be necessary in Ironic? Bump oslo.db version? Anything else?07:42
ajyas/9.0.1/9.1.007:42
iurygregoryI think a version bump is the way to go07:43
rpittauajya: a bump in version should just be ok, considering that we have already a story behind the issue07:43
rpittauajya: thanks for checking that btw, I marked the story as triaged, if you could add a comment (even just a link to the oslo.db fix) would be great07:47
ajyathanks, rpittau 07:47
ajyawould that qualify for release note to say that "port already exists" error got "fixed"?07:48
iurygregorya release note would be good to explain why we need to bump =)07:48
rpittauajya: release note is definitely required, although not as a fix but as an upgrade07:49
ajyagreat, thanks07:51
rpittausure :)07:52
*** sshnaidm|afk is now known as sshnaidm08:50
rpittaustendulker: is it ok to approve https://review.opendev.org/c/openstack/ironic/+/796289 even if HPE CI is failing ?09:29
stendulkerrpittau: there is some internal issue in our CI. Failure are not related to patch. Tests are failing in 5-6 mins. Will update you.09:42
rpittaustendulker: ack09:43
opendevreviewMerged openstack/ironic master: Document managed inspection  https://review.opendev.org/c/openstack/ironic/+/79648710:06
opendevreviewMerged openstack/ironic master: dhcp-less: mention how to provide network_data to instance  https://review.opendev.org/c/openstack/ironic/+/79647810:15
dtantsurmorning/afternoon ironic10:48
opendevreviewArne Wiebalck proposed openstack/ironic-python-agent master: Only mount the ESP if not yet mounted  https://review.opendev.org/c/openstack/ironic-python-agent/+/79604511:11
janderssee you tomorrow Ironic o/12:02
opendevreviewDmitry Tantsur proposed openstack/ironic stable/wallaby: dhcp-less: mention how to provide network_data to instance  https://review.opendev.org/c/openstack/ironic/+/79656912:06
opendevreviewDmitry Tantsur proposed openstack/ironic bugfix/18.0: Handle non-key-value params in [inspector]extra_kernel_params  https://review.opendev.org/c/openstack/ironic/+/79657012:07
opendevreviewDmitry Tantsur proposed openstack/ironic stable/wallaby: Handle non-key-value params in [inspector]extra_kernel_params  https://review.opendev.org/c/openstack/ironic/+/79663812:18
opendevreviewDmitry Tantsur proposed openstack/ironic master: Fix typos in inspection docs  https://review.opendev.org/c/openstack/ironic/+/79664012:21
opendevreviewRiccardo Pittau proposed openstack/ironic bugfix/18.0: dhcp-less: mention how to provide network_data to instance  https://review.opendev.org/c/openstack/ironic/+/79665513:18
opendevreviewMerged openstack/ironic bugfix/18.0: Handle non-key-value params in [inspector]extra_kernel_params  https://review.opendev.org/c/openstack/ironic/+/79657013:33
TheJuliagood morning13:37
opendevreviewMerged openstack/ironic stable/wallaby: dhcp-less: mention how to provide network_data to instance  https://review.opendev.org/c/openstack/ironic/+/79656913:39
arne_wiebalckI see this quite reproducibly when deploying nodes: http://paste.openstack.org/show/806693/ 14:55
arne_wiebalckSome race in oslo logging?14:55
TheJuliaarne_wiebalck: the out of order or the exception?14:59
arne_wiebalckTheJulia: the exception15:09
TheJuliapossibly, that feels like deja vu15:11
NobodyCamHappy Hump day Ironic'ers 15:25
dtantsuro/15:25
arne_wiebalckHey NobodyCam o/15:28
NobodyCamgood Morning dtantsur and arne_wiebalck 15:29
TheJuliagood morning NobodyCam 15:45
rpittauhey NobodyCam :)15:45
NobodyCamgood morning TheJulia and rpittau !!!15:45
* TheJulia glares at sqlalchemy15:51
* TheJulia listens to very loud industrial rock, and continues glaring at sqlalchemy16:00
arne_wiebalckTheJulia: I found another deploy without the error: http://paste.openstack.org/show/806698/16:02
arne_wiebalckTheJulia: It seems there is another thread?16:02
* arne_wiebalck is sorry to interrupt Kraftwerk16:03
TheJuliayes, it is16:03
TheJulialikely power sync coming in and trying to run on a locked node16:03
TheJuliawhich became locked between when the query ran and when the task launched16:04
arne_wiebalckDoesn't the message indicate both threads try to deploy?16:08
arne_wiebalckTo me, it looks like one thread is holding the lock, deploys, is finished, releases the lock, starts cleanup ... then another thread grabs the lock, wants to deploy and uses a node which is being cleaned up.16:10
rpittaugood night! o/16:11
*** rpittau is now known as rpittau|afk16:11
TheJuliaarne_wiebalck: hmm16:13
TheJuliaarne_wiebalck: it could be the result of stacking heartbeats....16:13
TheJuliaboth for continue deploy which would do it16:13
arne_wiebalckhmm sth like http://paste.openstack.org/show/806700/ ?16:18
arne_wiebalcknote the identical timestamps16:18
arne_wiebalckfrom what I see heartbeat would try to get a shared lock, then upgrade it ... I do not see this at the given timestamps (only before and after)16:21
TheJuliahmm16:22
TheJuliaI've seen this, i thought it was something like the power sync or stacked queued heartbeats both with shared locks thinking they can upgrade16:23
TheJuliaone beats the other, and kaboom16:23
arne_wiebalckthere seems to be no heartbeat close by, the last one is like 45 secs earlier16:24
TheJuliaI guess it would make sense if we also logged the request id in with the task16:25
TheJuliaat least around lock actions16:25
arne_wiebalckthere is power sync close by either16:25
arne_wiebalckrequest id == task manager context?16:26
TheJuliaopenstack request id... and it should be in the request context16:27
TheJuliaarne_wiebalck: https://opendev.org/openstack/oslo.context/src/branch/master/oslo_context/context.py#L61-L6216:28
TheJuliaglobal request id can be passed in, local request_id is generated16:28
TheJuliaaiui16:28
TheJuliaand by passed in, I mean by the API user16:28
TheJuliafor cross-service reqeust tracking16:29
arne_wiebalckfrom what I see we set this already ... is only the logging missing?16:40
TheJuliaarne_wiebalck: correct, only logging16:53
arne_wiebalckTheJulia: ok, I am on it to add it ... what would we expect to see?16:53
TheJuliaa uuid that we can trace back to the api request that triggered it16:54
TheJuliawell, req-<uuid>16:55
arne_wiebalckyou expect for the two lock attempts there will be different UUIDs?16:57
TheJuliayes, in this case17:00
TheJuliarealistically the req-id getting logged will allow it to be backtraced17:01
arne_wiebalckI instrumented the code to dump the context on reserve and release ... started a deploy now ...17:05
opendevreviewDmitry Tantsur proposed openstack/ironic stable/wallaby: Handle non-key-value params in [inspector]extra_kernel_params  https://review.opendev.org/c/openstack/ironic/+/79663817:16
frigohello Ironic!17:20
TheJuliahello frigo!17:20
arne_wiebalckTheJulia: Different request IDs: http://paste.openstack.org/show/806702/17:23
TheJuliaarne_wiebalck: can you trace them back in your api log?17:24
TheJuliaor elsewhere?17:24
TheJuliaalso, didn't mean for you to log your entire context, since context.auth_token is a thing......17:24
arne_wiebalckyes, look here: http://paste.openstack.org/show/806703/17:27
arne_wiebalckthe same id is used for a heartbeat 1 min earlier17:27
arne_wiebalcknot sure what this means, though17:28
TheJuliait was retrying to get the lock17:37
TheJuliaso! I guess wait until an exception occurs and backtrace it?17:38
arne_wiebalckprint a backtrace from where the oslo exception occurs?17:39
TheJuliano, your going to have to take the req-id from where the backtrace occured, and work backwards17:39
arne_wiebalckthe same id is used in a heartbeat 1min earlier ... why does the purpose change?17:43
TheJuliaoh17:45
TheJuliahmm17:45
TheJuliawas there another heartbeat in between?17:45
*** ricolin_ is now known as ricolin17:49
arne_wiebalckthere are two simultaneously17:53
arne_wiebalckand one in between17:53
TheJuliawhat ironic version is this17:59
arne_wiebalckVictoria18:00
TheJuliaI *think* dmitry fixed a thing where we would stack heartbeats in Wallaby.18:00
arne_wiebalckthis is the sequence of locks for heartbeat http://paste.openstack.org/show/806707/18:02
arne_wiebalckthere are quite some heartbeats :)18:02
opendevreviewJulia Kreger proposed openstack/ironic master: Fix node detail instance_uuid request handling  https://review.opendev.org/c/openstack/ironic/+/79672018:05
opendevreviewJulia Kreger proposed openstack/ironic master: Remove _get_nodes_by_instance method  https://review.opendev.org/c/openstack/ironic/+/79672118:05
arne_wiebalckTheJulia: I think the change you're referring to is in Victoria already18:05
opendevreviewJulia Kreger proposed openstack/ironic master: Fix node detail instance_uuid request handling  https://review.opendev.org/c/openstack/ironic/+/79672018:06
opendevreviewJulia Kreger proposed openstack/ironic master: Remove _get_nodes_by_instance method  https://review.opendev.org/c/openstack/ironic/+/79672118:06
TheJuliaarne_wiebalck: oh :(18:07
arne_wiebalck"Do not retry locking when heartbeating"18:07
TheJuliawow... that is some aggressive heartbeating18:07
arne_wiebalckhttps://github.com/openstack/ironic/commit/e6e774f524735cd33863d079e536a668345af26218:08
arne_wiebalckif that is the one?18:08
arne_wiebalckwe do not change any heartbeat config from the defaults18:09
arne_wiebalck(I *think* :-))18:09
arne_wiebalckok, getting late here (Pizza almost ready ;)18:10
arne_wiebalckI think the heartbeat is waiting for the excl lock, I do not understand why the purpose changes18:10
arne_wiebalcknor why there are so many18:11
arne_wiebalcktomorrow :)18:11
TheJuliaarne_wiebalck: yes, go eat pizza :)18:11
arne_wiebalckthanks TheJulia, have good night everyone o/18:11
TheJuliaarne_wiebalck: I think it is going to take braincells and some time focusing on it to wrap our heads around exactly what is going on. it shouldn't be waiting for an exclusive lock at this point, if I remember correctly18:13
arne_wiebalckwhat is 'it', the heartbeat?18:14
TheJuliayeah18:14
arne_wiebalckright, dtantsur's patch should prevent this from what I see18:14
arne_wiebalckno retry on lock upgrade18:15
TheJuliaahh18:15
arne_wiebalckhuh?18:15
TheJuliaI've got 3 concurrent conversations18:16
TheJuliaokay18:16
TheJuliaso heartbeat comes in18:16
TheJulialets call it hb1, hb1 has no prior lock, it gets the lock.18:17
TheJuliathen hb2 rapidly comes in before the lock is committed to the db. It thinks it also can get an exclusive lock18:17
TheJuliais that what your thinking?18:17
TheJuliasince we're not upgrading18:18
arne_wiebalckit looks like the deploy task holds the lock and the moment it releases it, hb2 grabs it18:19
arne_wiebalckwhich is 1min after it tried last time18:19
arne_wiebalckI am sorry, I have to go, otherwise I will have charcoal for dinner :-D18:20
arne_wiebalckthanks TheJulia o/18:20
TheJuliaso we're still stacking them \o/18:20
opendevreviewJulia Kreger proposed openstack/ironic master: WIP Scoped RBAC Devstack Plugin support  https://review.opendev.org/c/openstack/ironic/+/77895718:47
frigohey, thank you TheJulia, dtantsur, Chris for the review, I am abandonning the change for the S3  RFE if you remember (https://storyboard.openstack.org/#!/story/2008671). Thanks for the time you spent on the review :) I put some rationale on why the change is abandonned on the code review18:48
TheJuliafrigo: okay :(18:52
TheJuliafwiw, if you want to api-ize it, the awesome folks at Oath added a kickstart deploy interface18:54
JayFs/Oath/literally any other name for the company now known as Verizon Media/19:01
JayFlol19:01
frigoyou have more info? don't know what oath is19:02
JayFfrigo: I am abandonning this change. What we wanted to do is provision baremetal servers but, we don't need Ironic for that. We can just use an iso with a kickstart file, and that's good enough for us.19:02
JayFfrigo: that use case is supported by Ironic as of Wallaby19:02
TheJulias/Verizon Media/$NewCorpNameHere/19:02
frigoI hope whatever you build on metal3 baremetal-operator, I can use on top of my baremetal operator, cause I basically copy pasted most of the spec there :D19:02
JayFfrigo: you can provision machines, with kickstart, using Ironic :)19:02
JayFbut you're obviously welcome to pick whatever tool suits your needs best. Good luck :D 19:03
frigoyeah, but I can also provision machines without Ironic with kickstart, and it tuned out quite easy for my needs of course19:03
frigo:D I will continue using Ironic anyway, as we use that to deploy OpenStack with RHOSP19:04
TheJuliaeveryone's needs are different19:04
TheJuliagmann: how much push back would I get if I added ironic to this list https://github.com/openstack/tempest/blob/7e96c8e854386f43604ad098a6ec7606ee676145/tempest/config.py#L123219:17
TheJuliaotherwise we need to do plugin localized which we can do, but ugh19:17
* TheJulia sighs19:31
TheJuliaarne_wiebalck: Tomorrow! If you could take a glance at https://review.opendev.org/c/openstack/ironic-python-agent/+/796068 We've seen a case where people may provide bad images, and I think this would allow us to handle that case, or at least try to when they try to use software raid + UEFI.19:35
opendevreviewArun S A G proposed openstack/ironic master: Add documentation for anaconda deploy interface  https://review.opendev.org/c/openstack/ironic/+/79611019:47
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: WIP/DNM: Remove md check on EFI asset preservation  https://review.opendev.org/c/openstack/ironic-python-agent/+/79606819:54
TheJuliaarne_wiebalck: stevebaker: looks like _configure_grub would still need to run19:55
opendevreviewArun S A G proposed openstack/ironic master: Add documentation for anaconda deploy interface  https://review.opendev.org/c/openstack/ironic/+/79611021:32
gmannTheJulia: you can add ironic there from plugin side itself. same way you do Ironic in service_available config group - https://github.com/openstack/ironic-tempest-plugin/blob/master/ironic_tempest_plugin/plugin.py#L4221:34
gmanneach tempest plugin can register their service in this group and check the scope is enable or not in consistent way21:36
jandersgood morning Ironic o/22:28
stevebakermorning22:30
stevebakerTheJulia: just for the softraid whole-disk case?22:30
TheJuliaahh, hmm... Seems like it is possible for us to at least wire in high level handler to be able to toggle tempest tests so they can use system scope requests22:31
TheJuliagmann: ^^22:31
TheJuliastevebaker: yes22:31
stevebakerok22:31
gmannTheJulia: yes, main challenge or more work is we have to toggle all the tests together if we enable the scope on service side. my plan is to add a separate non-voting job with enforce_scope true on service side and keep switching the test cases and at the end when all the test cases are ready with system scope token request then enable enforce_scope in normal jobs too.  22:50
TheJuliagmann: realistically, they are all new tempest tests which are required. But they can be in a separate class with separate initial configuration for the client22:52
TheJuliathe old class can be deprecated or something, but from a interop/compliance standpoint, the old tests are still valid on older releases of openstack.22:52
gmannTheJulia: old existing test also you can switch other than adding new one22:52
TheJuliayes, it could be a config option as well22:53
gmannyes22:53
TheJuliabut ultimately, configuration options are the bane of anyone trying to use tempest22:53
gmannTheJulia: this is how I am working on nova tempest tests (need some work on devstack side though) https://review.opendev.org/c/openstack/tempest/+/740122/10/tempest/api/compute/admin/test_hypervisor.py22:54
gmannTheJulia: this is only one config option which can tell test if enforce_scope is true or not on service side22:54
TheJuliathat keeps it fairly simple depending on the client, however things like scenario jobs where varying levels of access may be required, I can see that being a lot more work22:58
TheJuliaalthough, not that much. In essence, those would have to be new tests regardless22:59
TheJuliasince they wouldn't operationally be backwards compatible23:00
gmannyeah scenario test might need more work depends on what all different credentials/API we use in that test23:00
TheJuliaindeed23:05

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!