*** hjensas is now known as hjensas|afk | 00:04 | |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP Scoped RBAC Devstack Plugin support https://review.opendev.org/c/openstack/ironic/+/778957 | 00:31 |
---|---|---|
opendevreview | Julia Kreger proposed openstack/ironic master: Remove redundant/legacy is_admin logic https://review.opendev.org/c/openstack/ironic/+/796552 | 00:31 |
arne_wiebalck | Good morning Ironic! | 05:22 |
stendulker | Good morning arne_wiebalck ! | 05:23 |
arne_wiebalck | Hey stendulker, how are you doing? | 05:24 |
stendulker | I'm good. Thank you. How are things at your end? | 05:25 |
arne_wiebalck | All well over here as well, things are getting back to normal (slowly) and summer is coming :) | 05:26 |
stendulker | Thats nice. Cases are also decreasing here and we probably should open-up by month end. | 05:52 |
arne_wiebalck | That's great to hear! | 06:15 |
iurygregory | good morning arne_wiebalck stendulker and Ironic o/ | 06:30 |
stendulker | good morning iurygregory ! | 06:31 |
arne_wiebalck | Hey iurygregory o/ | 06:31 |
iurygregory | o/ | 06:31 |
* arne_wiebalck wonders if iurygregory is an alligator now | 06:31 | |
iurygregory | arne_wiebalck, not yet =( I think I need the 2nd dose =) | 06:32 |
arne_wiebalck | iurygregory: ok, will check then again :-D | 06:32 |
iurygregory | arne_wiebalck, yeah :D probably a few weeks after 24/07 :D | 06:32 |
janders | good morning arne_wiebalck stendulker iurygregory and Ironic o/ | 07:11 |
arne_wiebalck | hey janders o/ | 07:11 |
stendulker | good morning janders | 07:11 |
iurygregory | hey janders o/ | 07:13 |
*** rpittau|afk is now known as rpittau | 07:15 | |
rpittau | good morning ironic! o/ | 07:15 |
iurygregory | morning rpittau o/ | 07:18 |
rpittau | hey iurygregory :) | 07:18 |
opendevreview | Aija Jauntēva proposed openstack/ironic master: Redfish: Skip non-RAID controllers for storage https://review.opendev.org/c/openstack/ironic/+/796592 | 07:19 |
ajya | Good morning Ironic, a while back I mentioned issue with mysql 8.0.19+ where duplicate key error was not correctly parsed with mysql changes and broke inspection (https://storyboard.openstack.org/#!/story/2008901) | 07:41 |
ajya | duplicate key error parsing is now fixed in oslo.db 9.0.1, what would be necessary in Ironic? Bump oslo.db version? Anything else? | 07:42 |
ajya | s/9.0.1/9.1.0 | 07:42 |
iurygregory | I think a version bump is the way to go | 07:43 |
rpittau | ajya: a bump in version should just be ok, considering that we have already a story behind the issue | 07:43 |
rpittau | ajya: thanks for checking that btw, I marked the story as triaged, if you could add a comment (even just a link to the oslo.db fix) would be great | 07:47 |
ajya | thanks, rpittau | 07:47 |
ajya | would that qualify for release note to say that "port already exists" error got "fixed"? | 07:48 |
iurygregory | a release note would be good to explain why we need to bump =) | 07:48 |
rpittau | ajya: release note is definitely required, although not as a fix but as an upgrade | 07:49 |
ajya | great, thanks | 07:51 |
rpittau | sure :) | 07:52 |
*** sshnaidm|afk is now known as sshnaidm | 08:50 | |
rpittau | stendulker: is it ok to approve https://review.opendev.org/c/openstack/ironic/+/796289 even if HPE CI is failing ? | 09:29 |
stendulker | rpittau: there is some internal issue in our CI. Failure are not related to patch. Tests are failing in 5-6 mins. Will update you. | 09:42 |
rpittau | stendulker: ack | 09:43 |
opendevreview | Merged openstack/ironic master: Document managed inspection https://review.opendev.org/c/openstack/ironic/+/796487 | 10:06 |
opendevreview | Merged openstack/ironic master: dhcp-less: mention how to provide network_data to instance https://review.opendev.org/c/openstack/ironic/+/796478 | 10:15 |
dtantsur | morning/afternoon ironic | 10:48 |
opendevreview | Arne Wiebalck proposed openstack/ironic-python-agent master: Only mount the ESP if not yet mounted https://review.opendev.org/c/openstack/ironic-python-agent/+/796045 | 11:11 |
janders | see you tomorrow Ironic o/ | 12:02 |
opendevreview | Dmitry Tantsur proposed openstack/ironic stable/wallaby: dhcp-less: mention how to provide network_data to instance https://review.opendev.org/c/openstack/ironic/+/796569 | 12:06 |
opendevreview | Dmitry Tantsur proposed openstack/ironic bugfix/18.0: Handle non-key-value params in [inspector]extra_kernel_params https://review.opendev.org/c/openstack/ironic/+/796570 | 12:07 |
opendevreview | Dmitry Tantsur proposed openstack/ironic stable/wallaby: Handle non-key-value params in [inspector]extra_kernel_params https://review.opendev.org/c/openstack/ironic/+/796638 | 12:18 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Fix typos in inspection docs https://review.opendev.org/c/openstack/ironic/+/796640 | 12:21 |
opendevreview | Riccardo Pittau proposed openstack/ironic bugfix/18.0: dhcp-less: mention how to provide network_data to instance https://review.opendev.org/c/openstack/ironic/+/796655 | 13:18 |
opendevreview | Merged openstack/ironic bugfix/18.0: Handle non-key-value params in [inspector]extra_kernel_params https://review.opendev.org/c/openstack/ironic/+/796570 | 13:33 |
TheJulia | good morning | 13:37 |
opendevreview | Merged openstack/ironic stable/wallaby: dhcp-less: mention how to provide network_data to instance https://review.opendev.org/c/openstack/ironic/+/796569 | 13:39 |
arne_wiebalck | I see this quite reproducibly when deploying nodes: http://paste.openstack.org/show/806693/ | 14:55 |
arne_wiebalck | Some race in oslo logging? | 14:55 |
TheJulia | arne_wiebalck: the out of order or the exception? | 14:59 |
arne_wiebalck | TheJulia: the exception | 15:09 |
TheJulia | possibly, that feels like deja vu | 15:11 |
NobodyCam | Happy Hump day Ironic'ers | 15:25 |
dtantsur | o/ | 15:25 |
arne_wiebalck | Hey NobodyCam o/ | 15:28 |
NobodyCam | good Morning dtantsur and arne_wiebalck | 15:29 |
TheJulia | good morning NobodyCam | 15:45 |
rpittau | hey NobodyCam :) | 15:45 |
NobodyCam | good morning TheJulia and rpittau !!! | 15:45 |
* TheJulia glares at sqlalchemy | 15:51 | |
* TheJulia listens to very loud industrial rock, and continues glaring at sqlalchemy | 16:00 | |
arne_wiebalck | TheJulia: I found another deploy without the error: http://paste.openstack.org/show/806698/ | 16:02 |
arne_wiebalck | TheJulia: It seems there is another thread? | 16:02 |
* arne_wiebalck is sorry to interrupt Kraftwerk | 16:03 | |
TheJulia | yes, it is | 16:03 |
TheJulia | likely power sync coming in and trying to run on a locked node | 16:03 |
TheJulia | which became locked between when the query ran and when the task launched | 16:04 |
arne_wiebalck | Doesn't the message indicate both threads try to deploy? | 16:08 |
arne_wiebalck | To me, it looks like one thread is holding the lock, deploys, is finished, releases the lock, starts cleanup ... then another thread grabs the lock, wants to deploy and uses a node which is being cleaned up. | 16:10 |
rpittau | good night! o/ | 16:11 |
*** rpittau is now known as rpittau|afk | 16:11 | |
TheJulia | arne_wiebalck: hmm | 16:13 |
TheJulia | arne_wiebalck: it could be the result of stacking heartbeats.... | 16:13 |
TheJulia | both for continue deploy which would do it | 16:13 |
arne_wiebalck | hmm sth like http://paste.openstack.org/show/806700/ ? | 16:18 |
arne_wiebalck | note the identical timestamps | 16:18 |
arne_wiebalck | from what I see heartbeat would try to get a shared lock, then upgrade it ... I do not see this at the given timestamps (only before and after) | 16:21 |
TheJulia | hmm | 16:22 |
TheJulia | I've seen this, i thought it was something like the power sync or stacked queued heartbeats both with shared locks thinking they can upgrade | 16:23 |
TheJulia | one beats the other, and kaboom | 16:23 |
arne_wiebalck | there seems to be no heartbeat close by, the last one is like 45 secs earlier | 16:24 |
TheJulia | I guess it would make sense if we also logged the request id in with the task | 16:25 |
TheJulia | at least around lock actions | 16:25 |
arne_wiebalck | there is power sync close by either | 16:25 |
arne_wiebalck | request id == task manager context? | 16:26 |
TheJulia | openstack request id... and it should be in the request context | 16:27 |
TheJulia | arne_wiebalck: https://opendev.org/openstack/oslo.context/src/branch/master/oslo_context/context.py#L61-L62 | 16:28 |
TheJulia | global request id can be passed in, local request_id is generated | 16:28 |
TheJulia | aiui | 16:28 |
TheJulia | and by passed in, I mean by the API user | 16:28 |
TheJulia | for cross-service reqeust tracking | 16:29 |
arne_wiebalck | from what I see we set this already ... is only the logging missing? | 16:40 |
TheJulia | arne_wiebalck: correct, only logging | 16:53 |
arne_wiebalck | TheJulia: ok, I am on it to add it ... what would we expect to see? | 16:53 |
TheJulia | a uuid that we can trace back to the api request that triggered it | 16:54 |
TheJulia | well, req-<uuid> | 16:55 |
arne_wiebalck | you expect for the two lock attempts there will be different UUIDs? | 16:57 |
TheJulia | yes, in this case | 17:00 |
TheJulia | realistically the req-id getting logged will allow it to be backtraced | 17:01 |
arne_wiebalck | I instrumented the code to dump the context on reserve and release ... started a deploy now ... | 17:05 |
opendevreview | Dmitry Tantsur proposed openstack/ironic stable/wallaby: Handle non-key-value params in [inspector]extra_kernel_params https://review.opendev.org/c/openstack/ironic/+/796638 | 17:16 |
frigo | hello Ironic! | 17:20 |
TheJulia | hello frigo! | 17:20 |
arne_wiebalck | TheJulia: Different request IDs: http://paste.openstack.org/show/806702/ | 17:23 |
TheJulia | arne_wiebalck: can you trace them back in your api log? | 17:24 |
TheJulia | or elsewhere? | 17:24 |
TheJulia | also, didn't mean for you to log your entire context, since context.auth_token is a thing...... | 17:24 |
arne_wiebalck | yes, look here: http://paste.openstack.org/show/806703/ | 17:27 |
arne_wiebalck | the same id is used for a heartbeat 1 min earlier | 17:27 |
arne_wiebalck | not sure what this means, though | 17:28 |
TheJulia | it was retrying to get the lock | 17:37 |
TheJulia | so! I guess wait until an exception occurs and backtrace it? | 17:38 |
arne_wiebalck | print a backtrace from where the oslo exception occurs? | 17:39 |
TheJulia | no, your going to have to take the req-id from where the backtrace occured, and work backwards | 17:39 |
arne_wiebalck | the same id is used in a heartbeat 1min earlier ... why does the purpose change? | 17:43 |
TheJulia | oh | 17:45 |
TheJulia | hmm | 17:45 |
TheJulia | was there another heartbeat in between? | 17:45 |
*** ricolin_ is now known as ricolin | 17:49 | |
arne_wiebalck | there are two simultaneously | 17:53 |
arne_wiebalck | and one in between | 17:53 |
TheJulia | what ironic version is this | 17:59 |
arne_wiebalck | Victoria | 18:00 |
TheJulia | I *think* dmitry fixed a thing where we would stack heartbeats in Wallaby. | 18:00 |
arne_wiebalck | this is the sequence of locks for heartbeat http://paste.openstack.org/show/806707/ | 18:02 |
arne_wiebalck | there are quite some heartbeats :) | 18:02 |
opendevreview | Julia Kreger proposed openstack/ironic master: Fix node detail instance_uuid request handling https://review.opendev.org/c/openstack/ironic/+/796720 | 18:05 |
opendevreview | Julia Kreger proposed openstack/ironic master: Remove _get_nodes_by_instance method https://review.opendev.org/c/openstack/ironic/+/796721 | 18:05 |
arne_wiebalck | TheJulia: I think the change you're referring to is in Victoria already | 18:05 |
opendevreview | Julia Kreger proposed openstack/ironic master: Fix node detail instance_uuid request handling https://review.opendev.org/c/openstack/ironic/+/796720 | 18:06 |
opendevreview | Julia Kreger proposed openstack/ironic master: Remove _get_nodes_by_instance method https://review.opendev.org/c/openstack/ironic/+/796721 | 18:06 |
TheJulia | arne_wiebalck: oh :( | 18:07 |
arne_wiebalck | "Do not retry locking when heartbeating" | 18:07 |
TheJulia | wow... that is some aggressive heartbeating | 18:07 |
arne_wiebalck | https://github.com/openstack/ironic/commit/e6e774f524735cd33863d079e536a668345af262 | 18:08 |
arne_wiebalck | if that is the one? | 18:08 |
arne_wiebalck | we do not change any heartbeat config from the defaults | 18:09 |
arne_wiebalck | (I *think* :-)) | 18:09 |
arne_wiebalck | ok, getting late here (Pizza almost ready ;) | 18:10 |
arne_wiebalck | I think the heartbeat is waiting for the excl lock, I do not understand why the purpose changes | 18:10 |
arne_wiebalck | nor why there are so many | 18:11 |
arne_wiebalck | tomorrow :) | 18:11 |
TheJulia | arne_wiebalck: yes, go eat pizza :) | 18:11 |
arne_wiebalck | thanks TheJulia, have good night everyone o/ | 18:11 |
TheJulia | arne_wiebalck: I think it is going to take braincells and some time focusing on it to wrap our heads around exactly what is going on. it shouldn't be waiting for an exclusive lock at this point, if I remember correctly | 18:13 |
arne_wiebalck | what is 'it', the heartbeat? | 18:14 |
TheJulia | yeah | 18:14 |
arne_wiebalck | right, dtantsur's patch should prevent this from what I see | 18:14 |
arne_wiebalck | no retry on lock upgrade | 18:15 |
TheJulia | ahh | 18:15 |
arne_wiebalck | huh? | 18:15 |
TheJulia | I've got 3 concurrent conversations | 18:16 |
TheJulia | okay | 18:16 |
TheJulia | so heartbeat comes in | 18:16 |
TheJulia | lets call it hb1, hb1 has no prior lock, it gets the lock. | 18:17 |
TheJulia | then hb2 rapidly comes in before the lock is committed to the db. It thinks it also can get an exclusive lock | 18:17 |
TheJulia | is that what your thinking? | 18:17 |
TheJulia | since we're not upgrading | 18:18 |
arne_wiebalck | it looks like the deploy task holds the lock and the moment it releases it, hb2 grabs it | 18:19 |
arne_wiebalck | which is 1min after it tried last time | 18:19 |
arne_wiebalck | I am sorry, I have to go, otherwise I will have charcoal for dinner :-D | 18:20 |
arne_wiebalck | thanks TheJulia o/ | 18:20 |
TheJulia | so we're still stacking them \o/ | 18:20 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP Scoped RBAC Devstack Plugin support https://review.opendev.org/c/openstack/ironic/+/778957 | 18:47 |
frigo | hey, thank you TheJulia, dtantsur, Chris for the review, I am abandonning the change for the S3 RFE if you remember (https://storyboard.openstack.org/#!/story/2008671). Thanks for the time you spent on the review :) I put some rationale on why the change is abandonned on the code review | 18:48 |
TheJulia | frigo: okay :( | 18:52 |
TheJulia | fwiw, if you want to api-ize it, the awesome folks at Oath added a kickstart deploy interface | 18:54 |
JayF | s/Oath/literally any other name for the company now known as Verizon Media/ | 19:01 |
JayF | lol | 19:01 |
frigo | you have more info? don't know what oath is | 19:02 |
JayF | frigo: I am abandonning this change. What we wanted to do is provision baremetal servers but, we don't need Ironic for that. We can just use an iso with a kickstart file, and that's good enough for us. | 19:02 |
JayF | frigo: that use case is supported by Ironic as of Wallaby | 19:02 |
TheJulia | s/Verizon Media/$NewCorpNameHere/ | 19:02 |
frigo | I hope whatever you build on metal3 baremetal-operator, I can use on top of my baremetal operator, cause I basically copy pasted most of the spec there :D | 19:02 |
JayF | frigo: you can provision machines, with kickstart, using Ironic :) | 19:02 |
JayF | but you're obviously welcome to pick whatever tool suits your needs best. Good luck :D | 19:03 |
frigo | yeah, but I can also provision machines without Ironic with kickstart, and it tuned out quite easy for my needs of course | 19:03 |
frigo | :D I will continue using Ironic anyway, as we use that to deploy OpenStack with RHOSP | 19:04 |
TheJulia | everyone's needs are different | 19:04 |
TheJulia | gmann: how much push back would I get if I added ironic to this list https://github.com/openstack/tempest/blob/7e96c8e854386f43604ad098a6ec7606ee676145/tempest/config.py#L1232 | 19:17 |
TheJulia | otherwise we need to do plugin localized which we can do, but ugh | 19:17 |
* TheJulia sighs | 19:31 | |
TheJulia | arne_wiebalck: Tomorrow! If you could take a glance at https://review.opendev.org/c/openstack/ironic-python-agent/+/796068 We've seen a case where people may provide bad images, and I think this would allow us to handle that case, or at least try to when they try to use software raid + UEFI. | 19:35 |
opendevreview | Arun S A G proposed openstack/ironic master: Add documentation for anaconda deploy interface https://review.opendev.org/c/openstack/ironic/+/796110 | 19:47 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent master: WIP/DNM: Remove md check on EFI asset preservation https://review.opendev.org/c/openstack/ironic-python-agent/+/796068 | 19:54 |
TheJulia | arne_wiebalck: stevebaker: looks like _configure_grub would still need to run | 19:55 |
opendevreview | Arun S A G proposed openstack/ironic master: Add documentation for anaconda deploy interface https://review.opendev.org/c/openstack/ironic/+/796110 | 21:32 |
gmann | TheJulia: you can add ironic there from plugin side itself. same way you do Ironic in service_available config group - https://github.com/openstack/ironic-tempest-plugin/blob/master/ironic_tempest_plugin/plugin.py#L42 | 21:34 |
gmann | each tempest plugin can register their service in this group and check the scope is enable or not in consistent way | 21:36 |
janders | good morning Ironic o/ | 22:28 |
stevebaker | morning | 22:30 |
stevebaker | TheJulia: just for the softraid whole-disk case? | 22:30 |
TheJulia | ahh, hmm... Seems like it is possible for us to at least wire in high level handler to be able to toggle tempest tests so they can use system scope requests | 22:31 |
TheJulia | gmann: ^^ | 22:31 |
TheJulia | stevebaker: yes | 22:31 |
stevebaker | ok | 22:31 |
gmann | TheJulia: yes, main challenge or more work is we have to toggle all the tests together if we enable the scope on service side. my plan is to add a separate non-voting job with enforce_scope true on service side and keep switching the test cases and at the end when all the test cases are ready with system scope token request then enable enforce_scope in normal jobs too. | 22:50 |
TheJulia | gmann: realistically, they are all new tempest tests which are required. But they can be in a separate class with separate initial configuration for the client | 22:52 |
TheJulia | the old class can be deprecated or something, but from a interop/compliance standpoint, the old tests are still valid on older releases of openstack. | 22:52 |
gmann | TheJulia: old existing test also you can switch other than adding new one | 22:52 |
TheJulia | yes, it could be a config option as well | 22:53 |
gmann | yes | 22:53 |
TheJulia | but ultimately, configuration options are the bane of anyone trying to use tempest | 22:53 |
gmann | TheJulia: this is how I am working on nova tempest tests (need some work on devstack side though) https://review.opendev.org/c/openstack/tempest/+/740122/10/tempest/api/compute/admin/test_hypervisor.py | 22:54 |
gmann | TheJulia: this is only one config option which can tell test if enforce_scope is true or not on service side | 22:54 |
TheJulia | that keeps it fairly simple depending on the client, however things like scenario jobs where varying levels of access may be required, I can see that being a lot more work | 22:58 |
TheJulia | although, not that much. In essence, those would have to be new tests regardless | 22:59 |
TheJulia | since they wouldn't operationally be backwards compatible | 23:00 |
gmann | yeah scenario test might need more work depends on what all different credentials/API we use in that test | 23:00 |
TheJulia | indeed | 23:05 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!