rpittau | good morning ironic! o/ | 08:08 |
---|---|---|
opendevreview | Riccardo Pittau proposed openstack/ironic master: [WIP] Generic API for attaching/detaching virtual media https://review.opendev.org/c/openstack/ironic/+/894918 | 08:38 |
opendevreview | Riccardo Pittau proposed openstack/ironic master: Generic API for attaching/detaching virtual media https://review.opendev.org/c/openstack/ironic/+/894918 | 08:38 |
*** eandersson0 is now known as eandersson | 11:50 | |
TheJulia | stevebaker[m]: a few more examples, including one that says CDROM https://storyboard.openstack.org/#!/story/2008763 | 13:59 |
dtantsur | TheJulia: Should I copy my downstream epic into a new upstream RFE? | 14:12 |
TheJulia | I'd likely just make a new RFE for out of band record removal | 14:36 |
TheJulia | Two related, but a little different problems. One helps the other along as well, at least in some cases, but not all cases. | 14:37 |
dtantsur | TheJulia: https://bugs.launchpad.net/ironic/+bug/2042570 | 15:00 |
dtantsur | JayF: https://bugs.launchpad.net/ironic/+bug/2042575 | 15:22 |
JayF | tyvm, if you're feeling extra spicy go link it in the workstreams, even just editing it into the right place woudl be helpful | 15:28 |
JayF | but otherwise i'll find and get it | 15:28 |
dtantsur | if you have the workstreams link handy | 15:29 |
* JayF just going to do it now, I have a minute | 16:05 | |
JayF | I'm a little double busy, I recruited a downstream GR-OSS python dev to help out with python 3.12 support in eventlet. | 16:05 |
JayF | and we're walking through it together. The results are ... concerning to say the least? | 16:06 |
dtantsur | Not a fun activity, I assume | 16:06 |
JayF | https://etherpad.opendev.org/p/nov-2023-eventlet-gross is our very limited notes so far, we're digging in more in a bit | 16:06 |
* dtantsur loves the title and fully agrees with it | 16:06 | |
JayF | dtantsur: I am like, 50% convinced the ironic migrations unit tests issues last cycle may be root-caused with some of the stuff we're finding | 16:07 |
JayF | dtantsur: it's a double-whammy: I work at GR-OSS (G-Research Open Source Software) | 16:07 |
dtantsur | le sigh | 16:07 |
dtantsur | :D | 16:07 |
dtantsur | I still pondering every now and then how hard would it be to migrate off eventlet | 16:08 |
dtantsur | The biggest problem is monkey patching which makes it all-or-nothing | 16:08 |
JayF | That's basically the conversation I'm having with Itamar, is that over time we've basically dug this hole deeper because getting out of it is so intensely complex. | 16:08 |
JayF | Even the simplest possible example: IPA. Extract eventlet monkey patching from it. Good luck! | 16:08 |
dtantsur | "RLock reentrancy in the face of signals and (much more common) garbage collection callbacks" sounds like nightmare material to me | 16:09 |
JayF | https://gist.github.com/itamarst/6883251d4d4e5196f42b90608ab49c37 | 16:09 |
JayF | wanna reproduce it? | 16:09 |
dtantsur | JayF: Well, Actually (tm) | 16:09 |
JayF | dtantsur: first of all, new wsgi server. oslo.service + ssl is going to croak | 16:09 |
dtantsur | I think IPA could be an easy win. It does not do much and has a very low concurrency | 16:09 |
dtantsur | yeah | 16:09 |
JayF | you are forgetting oslo libraries | 16:09 |
JayF | I'm saying even for IPA, it'd be a lot of risk | 16:09 |
JayF | and a lot of moved code | 16:09 |
dtantsur | I'm not really, i'm just mentally prepared to burn them with fire | 16:09 |
JayF | extrapolate that to Ironic. to Nova. to Neutron. | 16:10 |
dtantsur | These are problematic, yes | 16:10 |
JayF | a direct migration path off eventlet is seemingly-impossible | 16:10 |
JayF | but there's a reason more people than me exist, there are smarter people and I have one of them who will rejoin me after eating lunch :D | 16:10 |
dtantsur | What I'm afraid of is getting cornered one day | 16:10 |
JayF | dtantsur: I think we're in the corner | 16:11 |
dtantsur | Not entirely, but on the way into it | 16:11 |
dtantsur | If we cannot make 4.12 work.... | 16:11 |
dtantsur | 3.12, sorry | 16:11 |
JayF | dtantsur: https://opendev.org/openstack/cyborg/src/branch/master/cyborg/common/utils.py#L237 | 16:11 |
dtantsur | (using OCP versions lol) | 16:11 |
JayF | dtantsur: see that? | 16:11 |
TheJulia | wheeeeeeee | 16:11 |
JayF | dtantsur: https://opendev.org/openstack/freezer/src/branch/master/freezer/engine/engine.py#L252 | 16:11 |
JayF | see that? | 16:11 |
JayF | guess what. the freezer one is going to trigger bugs because it's monkey_patched eventlet, and using SimpleQueue, if run on py>3.8 | 16:11 |
dtantsur | the freezer gonna freeze \o/ | 16:12 |
JayF | and nobody knows, because that doesn't run integration tests | 16:12 |
JayF | and we trust our libraries to not just... not have working CI and be generally broken | 16:12 |
JayF | and eventlet doesn't have working CI and has bugs which I'd categorize as being "generally broken" (e.g. simplequeue and rlock race conditions as described in the etherpad) | 16:12 |
dtantsur | Step one could be reaching out to oslo folks and asking for their opinion | 16:13 |
JayF | well, step 0 is where I'm at right now | 16:13 |
JayF | which is make sure I know the full picture of what's going on | 16:13 |
dtantsur | Maybe if someone can dedicate time to make oslo.service eventlet-agnostic, they just accept that | 16:13 |
dtantsur | then we're in a much better place | 16:13 |
rpittau | JayF: hey I guess we'll have to repeat this https://review.opendev.org/c/openstack/ironic/+/899765 for all the other projects? | 16:13 |
JayF | but the more of the picture that gets painted, the less I see of "dog and coffee cup" and the more I see "room on fire" | 16:13 |
* dtantsur puts on his "This is fine" t-short | 16:13 | |
JayF | rpittau: yes-ish, but I wanna make sure it looks like what nova lands, just so we aren't different for different sake | 16:14 |
rpittau | any plan to just switch entirely to pyproject.toml ? | 16:14 |
TheJulia | dtantsur: did you bring enough t-shirts for everyone? :) | 16:14 |
JayF | rpittau: I pushed that one up to get early CI, I just W-1'd it so we can wait for nova one to finalize, that one is failing CI so I think it might be working out some bugs | 16:14 |
JayF | rpittau: none proactive afaict | 16:15 |
rpittau | ok, thanks! | 16:15 |
dtantsur | TheJulia: I have two of them, I think my size should fit you :) | 16:15 |
dtantsur | JayF: it's also possible that a lot of oslo stuff is actually implemented elsewhere, and we're wasting our time with oslo.service. | 16:17 |
* dtantsur needs to leave early today, happy to continue the discussion tomorrow | 16:18 | |
TheJulia | dtantsur: dunno, I need to go bra shopping again :\ | 16:20 |
TheJulia | Perhaps a goal for the TC this cycle might be to move the needle | 16:21 |
TheJulia | w/r/t eventlet | 16:21 |
TheJulia | and dtantsur raises a really good point | 16:21 |
dtantsur | yeah. but the tough question is: how to do it one step at a time? | 16:21 |
TheJulia | With the monkey patching as-is... eek | 16:22 |
dtantsur | exactly | 16:22 |
TheJulia | we almost just need to treat it as a band aid or a piece of tape | 16:22 |
* JayF raises percentage likelihood eventlet threading.RLock problems caused ironic migration unit test issues to 90% | 16:49 | |
JayF | we're writing a reproducer right now. | 16:49 |
JayF | but we've found a potential bug that could cause the mutex around connection cleanup in sqlalchemy to be brokenish in eventlet-patched situations | 16:49 |
JayF | sound familiar? | 16:49 |
TheJulia | very much so | 16:55 |
JayF | nothing is proven until it is, though :) | 16:55 |
clarkb | JayF: re IPA I'm completely naive to its design btu seems like it doesn't realy need multiple threads of execution? Just process requests from control one at a time? Its not like you'll clean the disk and update firmware at the same time on hardware right? That seems like a recipe for interesting behaviors :) | 17:11 |
JayF | clarkb: disk image streaming | 17:12 |
JayF | clarkb: while also heartbeating progress to ironic api | 17:12 |
JayF | and disks can take days to shred | 17:12 |
clarkb | ok so you need two threads :) | 17:12 |
clarkb | a larger update then delete eventlet and remove monkey patching would be to just run in a single thread but maybe not terrible | 17:12 |
rpittau | good night! o/ | 17:13 |
JayF | that is likely the route we'd go, but IPA was just an example for OpenStack* | 17:21 |
JayF | because of the terrible we're discovering around eventlet | 17:21 |
JayF | Itamar is going to try to fix the RLock bug that has been reproducing in eventlet's CI for ages, but it's a tough one | 17:21 |
JayF | good news; if it's this bug; it only impacted unit tests | 17:31 |
opendevreview | Julia Kreger proposed openstack/sushy-tools master: Add Support testing for HttpBootUri https://review.opendev.org/c/openstack/sushy-tools/+/896963 | 17:42 |
JayF | we're going to dig on this rlock bug, I'm less convinced it caused our migrations issues just because the case it'd have to hit for this to be the root cause would be if sqlalchemy was imported, in a unit test, before eventlet was monkey patched (as far as we can tell) | 17:42 |
JayF | that's at least one of them | 17:43 |
opendevreview | Jay Faulkner proposed openstack/ironic master: eventlet monkey patch in unit tests earlier https://review.opendev.org/c/openstack/ironic/+/899976 | 17:49 |
JayF | TheJulia: dtantsur ^ I believe that is the workaround-fix for the migrations test root cause | 17:53 |
JayF | TheJulia: dtantsur: monkey_patching after importing something that imports sqlalchemy means sqlalchemy created an ungreened RLock, we have proof that can be broken in eventlet | 17:53 |
TheJulia | joy | 17:54 |
JayF | for non-unit-test codepaths, we were already monkey patching earlier, so we shouldn't have seen it IRL | 17:54 |
JayF | Itamar is going to improve the upgrade code, including adding angry messages if an RLock gets missed during upgrades | 17:54 |
JayF | which would allow us to roll that change back, hook unit tests back up, and prove that was the root cause | 17:54 |
JayF | this is the most satisfying find in years | 17:55 |
opendevreview | Julia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/898006 | 17:57 |
TheJulia | unfortunately it was also fairly hit and miss | 18:03 |
TheJulia | but it *would* prove to be a huge part of the unit test failure pain removed | 18:04 |
TheJulia | Okay, I've got a good patch for IPA to navigate the fun configuration drive bug | 18:19 |
TheJulia | And it nicely unpeels the confusion and asserts the right volume \o/ | 18:20 |
TheJulia | and can be backported | 18:20 |
opendevreview | Julia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/898006 | 19:38 |
opendevreview | Julia Kreger proposed openstack/sushy-tools master: Add Support testing for HttpBootUri https://review.opendev.org/c/openstack/sushy-tools/+/896963 | 19:40 |
opendevreview | Julia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/898006 | 19:46 |
opendevreview | Julia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/898006 | 19:57 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP/DNM: Advanced vmedia deployment test ops https://review.opendev.org/c/openstack/ironic/+/898010 | 21:01 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent-builder master: Remove USE_PYTHON3 option https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/898241 | 21:43 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent-builder master: Add Glean into TinyIPA image https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/898242 | 21:45 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!