rpittau | good morning ironic! o/ | 06:47 |
---|---|---|
rpittau | wow, I see we're still struggling with the unit tests timeouts :/ | 07:58 |
rpittau | JayF, TheJulia, do we want to increase the MIGRATIONS_TIMEOUT value for the time being? | 08:01 |
rpittau | also, I think we should change the gentle value for the timeout to False for all the tests, not just the migration ones | 08:07 |
opendevreview | Merged openstack/ironic-python-agent master: Fix Bandit errors https://review.opendev.org/c/openstack/ironic-python-agent/+/879912 | 09:25 |
opendevreview | likui proposed openstack/python-ironicclient master: These should be indented 4 spaces to match the other things in this block https://review.opendev.org/c/openstack/python-ironicclient/+/794691 | 09:58 |
opendevreview | Merged openstack/ironic-python-agent master: Fix nvidia hardware manager url parser to permit https https://review.opendev.org/c/openstack/ironic-python-agent/+/881410 | 10:11 |
Nisha_Agarwal | scottsol, hi | 10:26 |
iurygregory | good morning Ironic | 11:09 |
iurygregory | rpittau, re unit tests, I think TheJulia was able to reproduce things locally I seem to remember 4 patches that maybe would be necessary to fix https://review.opendev.org/c/openstack/ironic/+/885797 https://review.opendev.org/c/openstack/ironic/+/886865 https://review.opendev.org/c/openstack/ironic/+/886871 https://review.opendev.org/c/openstack/ironic/+/886881 | 11:13 |
TheJulia | rpittau: what is the present setting? | 13:07 |
iurygregory | TheJulia, MIGRATIONS_TIMEOUT is set to 60 | 13:12 |
iurygregory | good morning =) | 13:12 |
iurygregory | and the set for other tests is OS_TEST_TIMEOUT is 30 | 13:14 |
TheJulia | 60 seconds? | 13:18 |
TheJulia | the second migration test takes consistently 70 seconds on my desktop | 13:18 |
iurygregory | hummm | 13:18 |
iurygregory | interesting | 13:18 |
iurygregory | maybe we should change the value | 13:19 |
TheJulia | yeah | 13:54 |
opendevreview | Julia Kreger proposed openstack/ironic master: Fix test_migrations with firmware information. https://review.opendev.org/c/openstack/ironic/+/886881 | 13:58 |
JayF | we did cross a threshold then, eh | 14:06 |
JayF | just a threshold of doing the wrong thing more than the number of conns we had or somehting like | 14:06 |
JayF | that | 14:06 |
opendevreview | Julia Kreger proposed openstack/ironic master: CI: Change migrations timeout to be >60 seoncds https://review.opendev.org/c/openstack/ironic/+/886985 | 14:08 |
TheJulia | yeah, it might explain why test_migrations has always sort of been a little more likely to fail | 14:26 |
JayF | I'm adding a unit tests section to the meeting agenda | 14:45 |
JayF | Just in case I missed a patch, you wanna check up? https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting | 14:51 |
iurygregory | I think we also need https://review.opendev.org/c/openstack/ironic/+/885797 and https://review.opendev.org/c/openstack/ironic/+/886871 | 14:54 |
JayF | I'll make SQL2.0 tests a separate line item | 14:54 |
JayF | but it's a good callout | 14:54 |
iurygregory | makes sense | 14:54 |
iurygregory | ++ | 14:54 |
JayF | #startmeeting ironic | 15:01 |
opendevmeet | Meeting started Mon Jun 26 15:01:06 2023 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:01 |
opendevmeet | The meeting name has been set to 'ironic' | 15:01 |
JayF | Gooood morning Ironic'ers! Who all is here this morning? | 15:01 |
iurygregory | o/ | 15:01 |
rpittau | o/ | 15:01 |
matfechner | o/ | 15:01 |
TheJulia | o/ | 15:01 |
JayF | #topic Announcements/Reminder | 15:02 |
JayF | This is where I'd usually note to review things wiht #ironic-week-prio and to mark things as #ironic-week-prio if you want it reviewed | 15:02 |
JayF | but frankly, unit tests are a mess (to be addressed later) and we should not approve anything until they are fixes | 15:02 |
JayF | please point review love at the priority order of: unit tests; ironic-week-prio ;) | 15:02 |
JayF | There were no action items out of our 6/19 meeting, so I'm skipping that agenda item. | 15:03 |
JayF | #topic Reivew Ironic CI status | 15:03 |
JayF | you know, I put all the commenst in open discussion, but I think we can talk about this here | 15:03 |
JayF | Unit tests are broken! At least, they are until more things merge | 15:03 |
JayF | there are at least three separate things that we need to get unit tests back in good shape | 15:04 |
JayF | First: Failures when running migrations | 15:04 |
JayF | the changes we believe will fix this, and undo any attempts to mitigate the damage since it's been busted are | 15:04 |
JayF | #link https://review.opendev.org/c/openstack/ironic/+/886985 | 15:04 |
JayF | #link https://review.opendev.org/c/openstack/ironic/+/886881 | 15:04 |
JayF | #link https://review.opendev.org/c/openstack/ironic/+/886196 | 15:04 |
JayF | Second: Failures in TestNeutronVifPortMixing | 15:05 |
JayF | this one, we don't have fixed, but it's not really OK for it to fail intermittantly all the time either | 15:05 |
JayF | I proposed a change to disable it, and a bug to track re-enabling | 15:05 |
JayF | #link https://review.opendev.org/c/openstack/ironic/+/886911 | 15:05 |
JayF | #link https://bugs.launchpad.net/ironic/+bug/2024994 | 15:05 |
JayF | If you don't think this is the right approach; please comment in the bug and/or the change, I'm open to other solutions but not really Ok with us just suffering under a test that fails 5-10% of the time for any longer :D | 15:06 |
JayF | and finally | 15:06 |
JayF | Adding tests to ensure SQLA2.0 compat is not broken | 15:06 |
JayF | #link https://review.opendev.org/c/openstack/ironic/+/885797 to fix tests under sqla 2.0 | 15:06 |
JayF | #link https://review.opendev.org/c/openstack/ironic/+/886020 adds a sqla 2.0 job (we need to rebase this once tests are fixed) | 15:06 |
JayF | #link https://review.opendev.org/c/openstack/ironic/+/886871 Fixes an SAWarning around DB schema -- I'm not sure this is really testing related, but I think it's another change we need to land to make CI fully happy | 15:07 |
JayF | Any thoughts on this? Did I miss anything? Any discussion we wanna have around this? | 15:07 |
TheJulia | to at least... remove a warning that will become an error at some point | 15:07 |
rpittau | JayF: thanks, I believe that's all, and a lot :) | 15:08 |
TheJulia | No issues, I believe the first few are already in route to CI | 15:08 |
JayF | This all just makes me wonder if there's something we can look at from a higher level to help in the future | 15:08 |
JayF | like writing something in migrations tests to ensure you never pass a SQLA object in the data dict | 15:09 |
JayF | I don't really know, but it seems obvious we need more guardrails given the age of some of the miscoded tests fixed in 886881 | 15:09 |
TheJulia | It is not the most widely known aspect of sqlalchemy, fwiw | 15:10 |
JayF | and it's going to crop up more as people do sqla migrations | 15:10 |
TheJulia | so definitely not "easy" | 15:10 |
JayF | Part of me wonders if there's a guardrail to put in for oslo db | 15:11 |
JayF | to try and "save" the smaller projects from tha pain we just felt over the last two weeks | 15:11 |
TheJulia | That can be a whole debate | 15:11 |
JayF | well I might spike on having the discussion, if it gets to code then I can wail against that lol | 15:12 |
TheJulia | heh | 15:12 |
JayF | but I look at this this way: if it took us two weeks-ish, as a team that has more sqla strength than most | 15:12 |
opendevreview | Merged openstack/ironic master: Fix test_migrations with firmware information. https://review.opendev.org/c/openstack/ironic/+/886881 | 15:12 |
JayF | then a smaller project with a similar bug has very little chance | 15:12 |
JayF | but that's not an ironic problem; but I'll still try and spread the love of our lessons learned | 15:12 |
JayF | going to move on now unless there's further? | 15:13 |
TheJulia | no objection here | 15:13 |
JayF | #topic Review ongoing 2023.2 workstreams | 15:13 |
JayF | #link https://etherpad.opendev.org/p/IronicWorkstreams2023.2 | 15:13 |
JayF | I've been behind on reviewing some of this stuff because I've been pointing all my brainpower at tests; going to try and correct that this week. | 15:14 |
JayF | no comments on workstreams, moving on | 15:16 |
JayF | #topic Open Discussion | 15:16 |
JayF | I have a note here on PTL availability | 15:16 |
JayF | #note JayF will be mostly unavailable upstream the week of 7/10. | 15:16 |
TheJulia | do you need someone to run the meeting that week? | 15:17 |
JayF | Can someone volunteer to run the 7/10 and also 7/17 meetings? I should be here week of 7/17 just being careful | 15:17 |
JayF | That week I'll be visiting my coworkers in the UK, so I'll be in a different timezone even if I get to point my brain at upstream :D | 15:17 |
* TheJulia checks calendars | 15:18 | |
iurygregory | I can't =( | 15:18 |
iurygregory | 7/10 I'm on vacation | 15:18 |
JayF | Worse case, if I put this call out next week and get some "no"s, we can cancel the meeting. | 15:19 |
iurygregory | 7/17 I'm traveling back .-. | 15:19 |
JayF | This has been the most substantive one in a while just because of the unit tests :D | 15:19 |
TheJulia | I'm available | 15:19 |
JayF | for both? | 15:19 |
TheJulia | yeah | 15:19 |
rpittau | I'm also available | 15:19 |
JayF | who wants what? | 15:19 |
TheJulia | rpittau: if you'd like to, the gavel is yours | 15:19 |
JayF | You tell me and I'll put it in an action so we can have it marked | 15:19 |
JayF | as it is voluntold, so shall it be written :P | 15:19 |
JayF | #action rpittau to run Ironic meeting 7/10 and 7/17 | 15:20 |
TheJulia | hehehe | 15:20 |
TheJulia | lol | 15:20 |
TheJulia | evil man, evil :) | 15:20 |
JayF | Anything else for Open Discussion? | 15:20 |
rpittau | yay (?) | 15:20 |
TheJulia | rpittau: worst comes to worst, let lmk :) | 15:20 |
JayF | rpittau: it'll be your turn for PTL soon ;) | 15:20 |
TheJulia | heh | 15:20 |
JayF | rpittau: just embarce the gavel | 15:20 |
* rpittau hides | 15:20 | |
* TheJulia hears evil laughter from the background | 15:20 | |
* TheJulia notes in our line of work, evil is good. | 15:20 | |
iurygregory | ++ | 15:21 |
JayF | Anything non-philosophical for the rest of open discussion? hah | 15:21 |
JayF | Aight, I think that does it then. | 15:22 |
JayF | #endmeeting | 15:22 |
opendevmeet | Meeting ended Mon Jun 26 15:22:07 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:22 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-06-26-15.01.html | 15:22 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-06-26-15.01.txt | 15:22 |
opendevmeet | Log: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-06-26-15.01.log.html | 15:22 |
johnthetubaguy | JayF: as a heads up, I got the first two shard related nova patches up, in a very draft form: https://review.opendev.org/c/openstack/nova/+/886980 | 15:25 |
JayF | cool, I'll log that in https://etherpad.opendev.org/p/IronicWorkstreams2023.2 | 15:25 |
TheJulia | cool | 15:25 |
JayF | exciting but it also means I gotta get on with helping on the documentationing soon | 15:25 |
JayF | so in the past ~4 working hours | 15:26 |
JayF | unit tests are fixed | 15:26 |
JayF | sharding is drafted | 15:26 |
JayF | I'm going to go get some breakfast and lottery tickets :D | 15:27 |
johnthetubaguy | :) | 15:28 |
TheJulia | lol | 15:29 |
rpittau | JayF: https://review.opendev.org/c/openstack/ironic/+/886911 needs some pep8 love :) | 15:29 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Skip tests that fail occassionally in CI https://review.opendev.org/c/openstack/ironic/+/886911 | 15:30 |
JayF | I was running a locally cached copy of pep8 between my ears when I edited that on my windows box yesterday LOL | 15:31 |
opendevreview | Verification of a change to openstack/ironic master failed: CI: Change migrations timeout to be >60 seoncds https://review.opendev.org/c/openstack/ironic/+/886985 | 15:36 |
JayF | > openstack-tox-py39 https://zuul.opendev.org/t/openstack/build/7d9d11500f1c43b39987388adc072c0e : TIMED_OUT in 41m 24s | 15:36 |
JayF | oh no | 15:36 |
JayF | 2023-06-26 15:20:43.696512 | ubuntu-focal | {2} ironic.tests.unit.objects.test_volume_target.TestVolumeTargetObject.test_save [0.226305s] ... ok | 15:36 |
JayF | 2023-06-26 15:35:13.828453 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/zuul/zuul-jobs/playbooks/tox/run.yaml@master] | 15:36 |
TheJulia | .... wut | 15:37 |
rpittau | mmm we may need to set gentle to False as I wrote this morning | 15:37 |
* TheJulia wonders if it actually pulled with the parent | 15:37 | |
JayF | ...wouldn't that be an incredibly severe bug if it didn't? | 15:38 |
TheJulia | well, I think it comes down to depends-on behavior | 15:39 |
TheJulia | since it was tagged, not rebased upon | 15:39 |
JayF | it wasn't tagged depends-on | 15:40 |
JayF | it was on top of your change | 15:40 |
JayF | relationally | 15:40 |
TheJulia | It didn't look like it earlier | 15:40 |
* TheJulia will be able to look again in a few minutes | 15:40 | |
JayF | https://review.opendev.org/c/openstack/ironic/+/886985 looking at relation chain in top right | 15:40 |
JayF | I'm going to recheck because what the hell else am I going to do | 15:40 |
JayF | but I'll be screaming in anguish as I type it it :{ | 15:40 |
iurygregory | =( | 15:43 |
iurygregory | lay down, try to not cry, cry a lot | 15:43 |
TheJulia | 2f8ee2cf4..d9a000020 master -> master <-- that is what i would epxect | 15:43 |
TheJulia | le sigh | 15:43 |
TheJulia | expect | 15:44 |
JayF | is it possible the timeout fixture itself is somehow causing an issue? | 15:44 |
JayF | https://review.opendev.org/c/openstack/ironic/+/886196 will be the ... non-change test | 15:44 |
rpittau | JayF: I can't exclude it 1005, but it would be really weird | 15:44 |
JayF | because we're just enabling an additional test that would cause it to be more likely to fail if broken in this way | 15:45 |
rpittau | I can override the value of gentle and see if we can get an actual failure | 15:46 |
rpittau | the problem is that now if a test goes in timeout we won't see it because the exception will be caught, with gentle=False it will just throw the exception and stop the execution | 15:46 |
JayF | so the code I just rechecked is ~useless in it's current form? | 15:46 |
JayF | you wanna flip the bool in the change so it does the useful? | 15:46 |
rpittau | ehm... yeah :D | 15:46 |
JayF | my change to re-enable that test is landing smoothly | 15:47 |
JayF | all jobs in gate and done except docs | 15:48 |
rpittau | yep | 15:48 |
rpittau | I'll do it in a separate patch, don't want to mix stuff where we are now | 15:48 |
JayF | I wouldn't hate the mix because if it fails, we'll know why more clearly | 15:48 |
JayF | and I'm here and will kick it back into the gate immedaitely | 15:48 |
rpittau | alright then | 15:49 |
opendevreview | Merged openstack/ironic master: Revert "Disabling test_upgrade_twice temporarily for CI fix" https://review.opendev.org/c/openstack/ironic/+/886196 | 15:49 |
* TheJulia wonders if we have a different problem on postgres | 15:50 | |
TheJulia | I was testing with just mysql locally | 15:50 |
TheJulia | I didn't have the same setup for postgres | 15:50 |
JayF | we can see, but rpittau's change with the gentle=False will shed more light on that | 15:52 |
JayF | and in the meantime we seem to have at least closed the window some | 15:52 |
JayF | I think the only 100% never-false-negative test would be something like def test(): return True | 15:52 |
JayF | we just keep making progress and getting it better | 15:52 |
opendevreview | Riccardo Pittau proposed openstack/ironic master: CI: Change migrations timeout to be >60 seoncds https://review.opendev.org/c/openstack/ironic/+/886985 | 15:57 |
rpittau | should've probably changed the commit msg :/ | 15:57 |
JayF | do it in an edit real quick before I land it? | 15:57 |
opendevreview | Riccardo Pittau proposed openstack/ironic master: CI: Change migrations timeout to be >60 seoncds https://review.opendev.org/c/openstack/ironic/+/886985 | 15:58 |
opendevreview | Jay Faulkner proposed openstack/ironic master: CI: Change migrations timeout to be >60 seconds https://review.opendev.org/c/openstack/ironic/+/886985 | 15:58 |
JayF | I fixed the typo, too :D it's approved | 15:59 |
rpittau | ! | 15:59 |
JayF | It's OK, easier to spell when you only know one language when everyone else has to know 2+ :D | 15:59 |
rpittau | not all typos :) | 16:01 |
JayF | lol I fixed the one that I saw everytime the patchset bumped | 16:01 |
JayF | c'mon, we know the truth as software devs: the first line of our commit message is likely the only parts that'll be read after today :D | 16:01 |
opendevreview | Riccardo Pittau proposed openstack/ironic master: CI: Change migrations timeout to be >60 seoncds https://review.opendev.org/c/openstack/ironic/+/886985 | 16:02 |
rpittau | ok done | 16:02 |
rpittau | there was an error in the new variable, it would've caused issues | 16:02 |
JayF | https://zuul.opendev.org/t/openstack/build/6dd34d94f5c64cfca0b8eedec2de4e3b | 16:06 |
JayF | 2023-06-26 15:41:55.457564 | ubuntu-jammy | {4} ironic.tests.unit.test_base.DontBlockExecuteTestCase.test_no_exception_raised_for_execute [0.040523s] ... ok | 16:06 |
JayF | 2023-06-26 16:02:49.795961 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/zuul/zuul-jobs/playbooks/tox/run.yaml@master] | 16:06 |
JayF | WHY | 16:06 |
JayF | TheJulia: rpittau: ^ gate check for my patch failed on unit tests with driver libs | 16:06 |
JayF | froze up for 20 minutes | 16:06 |
rpittau | oO | 16:10 |
iurygregory | O.O | 16:11 |
JayF | hopefuly rpittau's patch will land and shed light on WTF | 16:15 |
JayF | whoa | 16:30 |
JayF | The following tests exited without returning a status | 16:30 |
JayF | and likely segfaulted or crashed Python: | 16:30 |
JayF | * ironic.tests.unit.drivers.modules.irmc.test_boot.IRMCVirtualMediaBootTestCase.test_clean_up_instance_with_secure_boot | 16:30 |
JayF | * ironic.tests.unit.drivers.modules.irmc.test_boot.IRMCVirtualMediaBootTestCase.test_prepare_instance_with_secure_booty | 16:30 |
JayF | that's on rpittau's change | 16:30 |
JayF | and is /also/ in the with-drivers category | 16:30 |
JayF | I bet we have a bad test in there, too | 16:30 |
TheJulia | ooooooh | 16:30 |
TheJulia | so! | 16:30 |
JayF | https://zuul.opendev.org/t/openstack/build/cc73bf5c96064178af9de32686d31dd2 | 16:30 |
TheJulia | speaking of! I believe vanou pushed an edit, I wonder if python-irmcclient got updated | 16:31 |
JayF | well nevermind ... this is hung https://zuul.opendev.org/t/openstack/stream/958243d5237249549fdaa121425604d5?logfile=console.log | 16:31 |
TheJulia | specifically to virtual media | 16:31 |
JayF | and the migrations timeout doesn't look like it's firing | 16:31 |
JayF | so we still have the hang, and we have another issue now, too? | 16:31 |
TheJulia | dunno, I just know i wasn't able to reproduce it on friday | 16:31 |
TheJulia | at least, with the patch when I could before | 16:31 |
JayF | well, we got a *lot* of passes, too | 16:33 |
JayF | I think at least 6/7 runs without seeing the timeout | 16:33 |
JayF | which might just mean we flipped the coin 7 times and got heads every time | 16:33 |
TheJulia | indeedly possible | 16:35 |
TheJulia | I was basically failing every run locally before, I think I had one pass before the patch on friday | 16:35 |
TheJulia | on that specific machine | 16:35 |
JayF | was there any other change you had dangling in that checkout that could've made a difference? | 16:37 |
JayF | the object layer delete piece, maybe? | 16:37 |
JayF | my original proposed fix that proved to at least be not enough | 16:37 |
TheJulia | I was rebasing the hold steps stuff | 16:38 |
TheJulia | and then took a clean copy, reproduced, fix there and then fixed on my local branch and had both passing | 16:38 |
JayF | did the clean copy ever fail? | 16:38 |
TheJulia | yes | 16:38 |
JayF | damn | 16:38 |
JayF | there is something in play here we still haven't sussed out then, yeah | 16:39 |
JayF | or adding int(node[id]) in those two places somehow made it worse, not better | 16:39 |
TheJulia | node.id | 16:39 |
TheJulia | they aren't json dicts | 16:39 |
JayF | sure, I'm just opening the code now so was going by memory | 16:39 |
TheJulia | ahh, ok | 16:39 |
JayF | I'm going to reread that file of tests again | 16:40 |
JayF | and see if there's anything I can figure out | 16:40 |
JayF | tempted to push a patch with nothing but enabling sqla debug logging | 16:42 |
JayF | if I did that would probably need to limit the tests to only test_migrations.py | 16:42 |
JayF | not even sure that'd tell us what we need though | 16:42 |
JayF | something weird about rpittau's change | 16:44 |
JayF | we're calling self.useFixture on an object class | 16:44 |
JayF | even though at usage time, it's mixed into classes that have that attr | 16:44 |
JayF | I don't think that's a potential cause, but saying it out loud | 16:45 |
JayF | we do the same in WalkVersionsMixin since forever | 16:47 |
rpittau | JayF: all the tests call self.useFixture because they're based on oslo test tests class, we're just overriding it | 16:53 |
JayF | well, they aren't in terms of actual inheretance | 16:53 |
JayF | my use of typed python in personal projects is showing here | 16:54 |
JayF | we're instantiating it as an object with intent to be mixed into an oslo test class | 16:54 |
JayF | it all works, was just something I noticed and when I saw we already had it (in WalkVersionsMixin) I discarded it as a "new" breakage | 16:54 |
JayF | I am curious about _check_487deb87cc9d | 16:55 |
JayF | it's weird to see patterns of using the engine in db_utils.get_table(engine, 'x') | 16:55 |
JayF | then a few lines down we are using with engine.begin() as connection | 16:55 |
JayF | and get_table looks like the kind of sqla magic we've been trying to get rid of | 16:56 |
rpittau | going offline now, see ya tomorrow! o/ | 16:56 |
JayF | o/ | 16:57 |
JayF | yeah that's probably fine, reading the sqla code | 16:59 |
JayF | TheJulia: would it be a confirmation this bug is kinda what we think it is, if we put the autocommit back in and it started passing again? | 17:09 |
TheJulia | possibly, got a link? | 17:09 |
JayF | I can make that patch if you think it's worth a shot | 17:10 |
TheJulia | k, on a call at the moment | 17:11 |
opendevreview | Jay Faulkner proposed openstack/ironic master: DNM: Revert "Remove autocommit, again." https://review.opendev.org/c/openstack/ironic/+/886995 | 17:12 |
JayF | oooh I found a thing, I think | 17:15 |
JayF | https://opendev.org/openstack/ironic/src/branch/master/ironic/tests/unit/db/sqlalchemy/test_migrations.py#L544 | 17:16 |
JayF | I wonder if we need to str() that row['network_interface'] | 17:16 |
JayF | or really just move the asserts up a level is a smaller change | 17:20 |
* JayF puts that change in his local branch and keeps looking | 17:21 | |
JayF | found another case similar to that | 17:22 |
TheJulia | you can access items as a dict or as a object | 17:33 |
TheJulia | at least, dependin gon pattern | 17:33 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Ensure all sqla objects descoped before ending txn https://review.opendev.org/c/openstack/ironic/+/887000 | 17:33 |
JayF | so if it's accessed as a dict, it's likely safe, accessed like an object, it's likely not | 17:33 |
JayF | so I think that makes ^^^ a noop then | 17:34 |
JayF | going to let CI finish just to see but yeah | 17:34 |
TheJulia | yeah, I think that is right | 17:34 |
TheJulia | so 544 is not doing anything that would be returned outside and keep a handler open | 17:35 |
TheJulia | at least, the way I'm interpretting it | 17:35 |
JayF | ack | 17:35 |
JayF | then I'm basically out of ideas | 17:35 |
JayF | other than a ton of "print statement debugging" in the gate | 17:35 |
JayF | which might heisenbug us | 17:36 |
NobodyCam | Good Morning Ironic folks | 17:40 |
JayF | you're half right | 17:41 |
JayF | well little more than half | 17:41 |
JayF | it is morning UGT | 17:41 |
JayF | and we are ironic folks | 17:41 |
JayF | lol | 17:41 |
NobodyCam | LoL | 17:41 |
NobodyCam | hehehee | 17:42 |
NobodyCam | hey hey JayF | 17:42 |
JayF | gnomes stole my happiness shortly after the Ironic meeting, and I'm hunting for it again :P | 17:42 |
NobodyCam | happen to know off the top of your head if we emit a message when entering / exiting rescue mode? | 17:44 |
JayF | we emit an oslo.notification for sure | 17:45 |
JayF | usually that means a log too, or you can certainly configure notifications to log | 17:46 |
JayF | but I'd have to check the code to be sure | 17:46 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Ensure all sqla objects descoped before ending txn https://review.opendev.org/c/openstack/ironic/+/887000 | 17:47 |
JayF | well, whether we think so or not, this one passed | 17:47 |
JayF | fixed the pep8 and will try it again | 17:47 |
JayF | interesting | 17:48 |
JayF | on the autocommit-enabling CI job | 17:49 |
JayF | 2023-06-26 17:27:38.962571 | ubuntu-jammy | * ironic.tests.unit.db.sqlalchemy.test_migrations.TestMigrationsMySQL.test_upgrade_and_version and ironic.tests.unit.db.sqlalchemy.test_migrations.ModelsMigrationsSyncMysql.test_models_sync bailed | 17:49 |
JayF | so I guess we know it's still in mysql, and it made the migrations timoue work better(?) to have autocommit on(?) | 17:49 |
TheJulia | uuggggghhhhh | 17:54 |
JayF | hmm test_models_sync | 17:55 |
JayF | I really wanna put the object deleter back in | 17:55 |
* JayF goes and reads it again | 17:55 | |
JayF | hmm test_models_sync adds a cleanup to drop all objects | 18:00 |
JayF | we are dangling a DB connection somewhere | 18:00 |
JayF | TheJulia: I'm pretty sure I'm seeing that even the connection.execute() that returns a dict-like object is still a sqlalchemy dict cursor | 18:04 |
TheJulia | the result set class? | 18:05 |
TheJulia | depends on how you ask for the entry, if memory serves | 18:05 |
JayF | no, a ResultProxy | 18:05 |
TheJulia | ahhhhhhhhh | 18:05 |
JayF | https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Connection.execute | 18:05 |
TheJulia | yeah, the resultproxy is evil | 18:05 |
JayF | that is 13, I should check 14 | 18:05 |
TheJulia | 1.4 depends on style of invocation | 18:05 |
JayF | a CursorResult | 18:05 |
TheJulia | in 2.0, we want tuples | 18:05 |
JayF | heh | 18:06 |
TheJulia | ORMey things generally get a cursor result, if you ask directly, you get the result tuple set of data | 18:06 |
JayF | it depends on /version of sqla at invocation/ if I'm reading this right | 18:06 |
TheJulia | correct | 18:06 |
JayF | if in a 1.x context; it returns the CursorResult which is a result proxy, yeah | 18:06 |
JayF | and my patch is about to succeed in CI again | 18:07 |
JayF | > This object returns rows as LegacyRow objects, which maintains Python mapping (i.e. dictionary) like behaviors upon the object itself. Going forward, the Row._mapping attribute should be used for dictionary behaviors. | 18:09 |
JayF | I think my patch is the real fix, and I think this fits the whole story we've been telling the whole time | 18:09 |
JayF | I just can't answer the question of why this passed for so long | 18:09 |
* TheJulia glares at workday | 18:10 | |
TheJulia | JayF: I think it was failing every so often | 18:10 |
TheJulia | just as you suggested, we passed a threshold | 18:10 |
TheJulia | at some point, all of the issues colided and there was extreme sadness | 18:10 |
JayF | I still have extreme sadness | 18:10 |
JayF | lol | 18:11 |
JayF | ironic-cross-sushy seems to be frozen now | 18:12 |
JayF | in my "hey I fixed it, see my explanation" PR | 18:12 |
JayF | https://zuul.opendev.org/t/openstack/stream/fc7c761fb23f4c148edd36ddbf83320c?logfile=console.log | 18:13 |
JayF | fungi: if I have a job that's currently running right now | 18:16 |
JayF | fungi: can you hold the node? | 18:16 |
JayF | fungi: https://zuul.opendev.org/t/openstack/stream/fc7c761fb23f4c148edd36ddbf83320c?logfile=console.log I'd LOVE to get ahold of this one | 18:16 |
clarkb | I can do it. I'm not sure fungi is around | 18:16 |
clarkb | can you share the info we need to hold it, project, change and job name? | 18:16 |
JayF | this is openstack/ironic ironic-cross-sushy | 18:17 |
JayF | https://review.opendev.org/c/openstack/ironic/+/887000 this change | 18:17 |
TheJulia | le sigh | 18:18 |
clarkb | ok hold is in place | 18:18 |
JayF | thank you | 18:18 |
TheJulia | whew, and it just timed out it looks like | 18:19 |
JayF | it didn't catch it, I think | 18:20 |
JayF | probably because it TIMED_OUT? | 18:20 |
clarkb | or I was too slow | 18:20 |
TheJulia | ugh | 18:20 |
clarkb | I think I caught it | 18:20 |
clarkb | I can ssh into the node at least. Maybe it is trying to delte and hasn't yet | 18:21 |
JayF | yeah I see it in https://zuul.opendev.org/t/openstack/nodes as deleting | 18:21 |
JayF | at least the one I think it is lol | 18:21 |
clarkb | 0034460245 is the node its held according ot nodepool | 18:21 |
clarkb | have an ssh pubkey I can add to the node? | 18:21 |
JayF | ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILR/OLTS/VWHzE3vgBFCaTNBg2+MRCENOmDr9oEMxzhZ jay@jvf.cc | 18:22 |
clarkb | JayF: root@158.69.66.88 | 18:23 |
JayF | this is way more interesting than yours, I think | 18:25 |
JayF | https://gist.github.com/jayofdoom/d0f38dce541ed4f2600e6d5d5cc34ace | 18:26 |
JayF | looks like we're trying to select from conductors while also trying to alter conductors and allocations table | 18:26 |
JayF | it's like two tests are stepping on each other or something | 18:26 |
JayF | I can confirm two separate python processes are accessing and deadlocked on the same DB | 18:28 |
JayF | but I'm not sure if that's meaningful, but there's absolutely multiple processes here talking to the same DB (and I don't mean DB host; I mean *the same DB string*) | 18:28 |
TheJulia | so here is one for you | 18:29 |
TheJulia | I consistently had 8 hanging connections per db last week | 18:29 |
TheJulia | so we're orphaning stuff someplace | 18:29 |
TheJulia | still | 18:29 |
JayF | but like, is it ever sane that multiple different python processes would be touching the DB? | 18:29 |
JayF | at the same time? | 18:30 |
JayF | shouldn't each test run in a single process, and each test have it's own BD? | 18:30 |
TheJulia | oh, still the same, but two different dbs at once | 18:30 |
JayF | **DB | 18:30 |
TheJulia | each runner gets its own db | 18:30 |
JayF | yeah this is the problem | 18:30 |
TheJulia | AIUI | 18:30 |
JayF | look at my output | 18:30 |
JayF | at the source port | 18:30 |
JayF | the source port via netstat on the host maps back to two different python pids | 18:30 |
TheJulia | yeah | 18:30 |
JayF | those pids are running different testing load lists | 18:32 |
TheJulia | yup | 18:32 |
JayF | +ironic.tests.unit.db.sqlalchemy.test_migrations.ModelsMigrationsSyncMysql.test_models_sync # scheduled to one | 18:32 |
JayF | https://gist.github.com/jayofdoom/d0f38dce541ed4f2600e6d5d5cc34ace?permalink_comment_id=4611051#gistcomment-4611051 | 18:33 |
JayF | so somehow they ended up running these tests against the same db at the same time | 18:33 |
JayF | which is exceedingly wrong | 18:33 |
clarkb | this looks similar to the problem we hit in zuul's db migrations where we had nested transactions. The outer one would hold a lock for some table and then the inner one couldn't update it | 18:34 |
clarkb | the fix was to flatten the transactions. But this was only a problem with sqla 2.0 not 1.4 | 18:34 |
JayF | but in this case, clarkb, we have two different python test runners talking to *the same DB on the same DB host* | 18:34 |
JayF | so the deadlock is coming from that conflict, not from a single test if I understand the flow correctly | 18:34 |
JayF | it's very possible I don't understand the flow correctly | 18:35 |
JayF | but until I hear otherwise, or convince myself, I'm working under the assumption that two different python PIDs, running different sets of tests, connecting to the same DB on the same DB host is a big nono | 18:35 |
JayF | I gotta find where those test names are created | 18:36 |
JayF | er, db names | 18:36 |
clarkb | where do you see multiple processes talking to the same db? | 18:36 |
TheJulia | +1 to clarkb's question | 18:36 |
JayF | so look at https://gist.github.com/jayofdoom/d0f38dce541ed4f2600e6d5d5cc34ace | 18:36 |
clarkb | there are multiple tcp connections but it may be from the same process | 18:36 |
JayF | then I feed that into | 18:36 |
JayF | netstat -nptu | 18:36 |
JayF | grep out the source port and map it back to two different python pids | 18:36 |
clarkb | note there are two dbs in the db listing too | 18:37 |
JayF | yes, I know | 18:37 |
JayF | thank you for asking for reciepts | 18:38 |
JayF | helped me realize I tranposed the port number | 18:38 |
JayF | both those connections to setsuipeap that are deadlocked *are* from the same process | 18:38 |
JayF | I just fat fingered a number/copy+pasted one row off, something like that | 18:38 |
TheJulia | c | 18:40 |
TheJulia | err | 18:40 |
TheJulia | c | 18:40 |
TheJulia | wow I cannot type this morning | 18:40 |
TheJulia | c'est la vie | 18:40 |
JayF | I wonder if like, oslotest started doing cleanups async or something | 18:43 |
JayF | or unittest | 18:43 |
JayF | add_allocations_table seems to be where it blew up | 18:48 |
JayF | oh | 18:48 |
JayF | OH!!!! | 18:48 |
JayF | TheJulia: ^^^ | 18:48 |
JayF | this is your fix | 18:48 |
JayF | the sawarning | 18:49 |
JayF | we should see if that is exhibiting this same stuff if rebased on master | 18:49 |
JayF | this is frozen trying to remove the fk that the sawarning is complaining about | 18:49 |
JayF | the case I'm looking at rn | 18:50 |
TheJulia | waiting, it is blocked atm | 18:50 |
JayF | blocked on what? CI? | 18:50 |
TheJulia | one of the other threads | 18:50 |
TheJulia | s/threads/connections/ | 18:50 |
TheJulia | so the ones in Sleep() state are still open, they need to close out | 18:50 |
TheJulia | so I think the answer is we need to find any and all open places | 18:51 |
JayF | I have an error in the log saying it can't drop conductors table | 18:51 |
JayF | bceause the fk to allocations still exists | 18:51 |
JayF | deep in the output of the frozen sushy test | 18:51 |
JayF | which is what sent me down this path | 18:51 |
TheJulia | That is just bizzar | 18:51 |
JayF | this is the random piece we've been looking for too -- the sawarning says it can't sort tables | 18:51 |
TheJulia | ... wonder if it "can't sort"... "to delete" | 18:52 |
JayF | that's sorta where I'm at | 18:52 |
JayF | we might need to stack your sawarning change with my chnge https://review.opendev.org/c/openstack/ironic/+/887000 | 18:52 |
JayF | TheJulia: maybe approve ^, see if we can get it in, then it'll make landing the sawarning fix easier | 18:55 |
JayF | it's a clear improvement even if it's not the only fix | 18:55 |
JayF | alternatively I can merge those two changes and try to get a clean-clean run | 18:55 |
TheJulia | done | 18:55 |
JayF | merge as in, make them a single change | 18:55 |
TheJulia | +a'ed | 18:56 |
JayF | TheJulia: I wonder if it's like... metadata.drop_all() fails if there are any capital-C-Connections, not if there are any connections to the db | 18:57 |
JayF | so the sleepy empty connections are probably OK; the outstanding queries would be a capital-C-Connection | 18:57 |
JayF | from sqlalchemy perspective | 18:57 |
TheJulia | Eh, they really need to get clsoed out | 19:00 |
JayF | but aren't we using a connection pool in the engine? | 19:01 |
TheJulia | yeah, but we can't do things if they have open result sets | 19:01 |
TheJulia | and we can't see that | 19:01 |
JayF | hmm | 19:01 |
JayF | Trying to think of a case where you could have `show full processlist;` be nothing but sleepy conns and things broken | 19:06 |
JayF | and then I realized, I don't know what things look like if I'm mid txn | 19:06 |
JayF | if this path doesn't work, I might science seeing what > begin(); insert stuff; # shows up like on the full processlist | 19:06 |
JayF | aight, going to feed the cats and myself (in that order per their demand), and see how much red is on the zuul status panel when I return | 19:07 |
opendevreview | Merged openstack/ironic master: Ensure all sqla objects descoped before ending txn https://review.opendev.org/c/openstack/ironic/+/887000 | 19:13 |
JayF | heeeey | 19:15 |
TheJulia | huh? | 19:16 |
JayF | it passed the gate | 19:16 |
JayF | that was nice | 19:16 |
JayF | just a celebratory heyyy | 19:16 |
JayF | https://review.opendev.org/c/openstack/ironic/+/886871 has already failed | 19:16 |
JayF | if I recheck, it'll rebase to master yeah? | 19:16 |
JayF | I would usually just hit the button but don't want to break your chain | 19:17 |
TheJulia | i guess you could | 19:17 |
TheJulia | https://review.opendev.org/c/openstack/ironic/+/885797 is at least mostly unrelated | 19:18 |
JayF | https://862c1d9848efa4012e00-65cf29ccd095bbc5984e6f01063fc964.ssl.cf1.rackcdn.com/886871/2/check/openstack-tox-py38/27a357f/job-output.txt has what looks like real failures | 19:18 |
JayF | unrealted to the stuff I just fixed, maybe | 19:18 |
JayF | 2023-06-26 18:58:06.634120 is the string to search | 19:18 |
* JayF off to pickup his lunch | 19:18 | |
clarkb | yes zuul always merges against the target branch | 19:19 |
TheJulia | all with postgres | 19:20 |
TheJulia | oh noes | 19:25 |
TheJulia | it looks like that breaks on postgres | 19:25 |
opendevreview | Julia Kreger proposed openstack/ironic master: Handle SAWarning around allocations FK Constratins https://review.opendev.org/c/openstack/ironic/+/886871 | 19:30 |
JayF | that's why I pointed it out | 19:37 |
JayF | TheJulia: pep8 unhappy on that change, you want me to fix it or you got it? | 19:47 |
TheJulia | i'll fix, i forgot since I had to jump on a call | 19:47 |
JayF | ack; np | 19:48 |
opendevreview | Julia Kreger proposed openstack/ironic master: Handle SAWarning around allocations FK Constratins https://review.opendev.org/c/openstack/ironic/+/886871 | 19:50 |
JayF | it timed out | 20:06 |
JayF | and the timeout worked on hte migrations test | 20:06 |
JayF | wtf | 20:06 |
JayF | like, zuul status FAILURE | 20:06 |
JayF | timed out via the fixture | 20:06 |
JayF | 2023-06-26 20:02:51.941686 in https://671fe52e0206ebfe4cbb-1ebf823a6670729b8775b6721553bbcb.ssl.cf5.rackcdn.com/886871/4/check/openstack-tox-py38/01a1ba0/job-output.txt | 20:07 |
JayF | MIGRATIONS_TIMEOUT just too low? | 20:07 |
JayF | and py39 / py310 have both been hung now for 8 minuts | 20:09 |
JayF | *minutes | 20:09 |
JayF | back to the drawing board | 20:10 |
JayF | ironic-cross-sushy hung too | 20:14 |
JayF | did we somehow make it more reproducable? wtf | 20:14 |
JayF | going to try to locally repro again | 20:16 |
JayF | just because if I can get it to repro, I might be able to debug it or something | 20:16 |
* JayF wonders if the hang being more reliable could be beacuse of the nature of that change | 20:41 | |
JayF | I'm going to recheck on a less invasive change | 20:41 |
TheJulia | k | 20:41 |
TheJulia | off calls, need to run to the post office real quick, I think mysql and postgres are next for me | 20:42 |
JayF | I even am running same sql version | 20:46 |
TheJulia | are you executing the "ye olde youtube video in HD in the background" while it is running? | 20:48 |
JayF | you saying I need to upgrade my gentoo while I run tests? | 20:48 |
JayF | is that what you are suggesting? | 20:49 |
TheJulia | I always have a youtube video running late in the day, and some of the unit test race issues we've seen in the past, have been reproduced under host load | 20:49 |
* JayF emerges the world | 20:50 | |
JayF | lucky gentoo comes with a built-in cpu cycle tester: the package manager | 20:51 |
JayF | lol | 20:51 |
JayF | TheJulia: is it possible that something like self.assertX() is holding onto a handle? | 20:58 |
JayF | TheJulia: like maybe in some kind of results collector or something | 20:58 |
JayF | so even in-scope for asserts we need to dupe the object? | 20:58 |
TheJulia | ... I wouldn't think so | 20:58 |
JayF | me neither, but that's implemented in C stdlib so I can't go read the code | 20:58 |
JayF | well, I have access to the code, I can't read it lol | 20:58 |
JayF | yeah doesn't look like it | 21:01 |
JayF | and it's python https://github.com/python/cpython/blob/3.8/Lib/unittest/case.py#L761 | 21:01 |
* TheJulia watches the slowest download ever | 21:05 | |
JayF | TheJulia: was pondering if the `if engine.dialect.name == 'mysql'` needed to include sqlite, I talked myself out of it | 21:05 |
JayF | because we don't support migrations on sqlite generally, yeah | 21:05 |
JayF | and I don't think we test against that either | 21:05 |
JayF | so it's not really likely to be an issue | 21:05 |
TheJulia | yeah, we've some some cases where advanced ops behave in not fun ways with differnces between postgres and mysql | 21:06 |
TheJulia | and *really* the only case we're goin gto break is on the teardown... really | 21:06 |
JayF | TheJulia: it'd be interesting science to see if your change worked with postgres version of these tests commented out; I suspect it won't though | 21:07 |
* JayF is onboard to drop psql support we find it's the culprit | 21:07 | |
JayF | just wonder about if it counts that it was deprecated forever and ever ago | 21:07 |
JayF | I have (former racker/ironicer) clif on a zoom with me now | 21:09 |
JayF | we're going to rubber duck at this | 21:09 |
TheJulia | https://671fe52e0206ebfe4cbb-1ebf823a6670729b8775b6721553bbcb.ssl.cf5.rackcdn.com/886871/4/check/openstack-tox-py38/01a1ba0/testr_results.html | 21:10 |
TheJulia | so... that is odd.... | 21:10 |
JayF | that is what I was looking at and commenting on earlier | 21:18 |
TheJulia | hmm | 21:26 |
TheJulia | different errors \o/ | 21:26 |
opendevreview | Julia Kreger proposed openstack/ironic master: Disable WAL Pragma for Unit Testing https://review.opendev.org/c/openstack/ironic/+/886865 | 21:29 |
opendevreview | Julia Kreger proposed openstack/ironic master: Handle SAWarning around allocations FK Constratins https://review.opendev.org/c/openstack/ironic/+/886871 | 21:29 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Fully ensure counts are out of scope of cxtmgr https://review.opendev.org/c/openstack/ironic/+/887008 | 21:29 |
JayF | TheJulia: ^^^^ | 21:29 |
JayF | TheJulia: clif helped me quack at it | 21:29 |
JayF | TheJulia: I moved it in but didn't change where it was declared | 21:30 |
JayF | that is 100% another instance of that, not sure it's the last one, but it certainly *is* one | 21:30 |
TheJulia | so I restacked my changes, they cleaned up failure errors in the unit test runs for me locally, fwiw | 21:30 |
JayF | nice | 21:30 |
JayF | wdyt about that I just pushed | 21:30 |
JayF | it's 100% an issue | 21:30 |
JayF | well, it's 100% a case of the thing I fixed before | 21:30 |
JayF | asusming those things are an issue which I'm fairly certain of | 21:31 |
TheJulia | could do it, this feels like a thousand cuts | 21:31 |
JayF | I counted, it's 9001 cuts exactly | 21:31 |
TheJulia | heh | 21:32 |
JayF | TheJulia: so I just had an epiphany -- we couldn't repro this in the job over the weekend b/c the clouds were underused | 21:46 |
JayF | TheJulia: so it was too fast to trigger the race | 21:47 |
JayF | which is why the 'fix' seemed so perfect compared to now | 21:47 |
TheJulia | heh | 21:48 |
JayF | now my change is passed everything but tox-py38 and tox-docs | 21:48 |
JayF | which just are queued and have been for a long time | 21:48 |
JayF | hash-tag better than a db hang | 21:48 |
iurygregory | I'm trying to catch up the whole conversation, tl;dr is that we would need this 3 changes to have unit tests happy? | 21:54 |
JayF | tl;dr we know nothing | 21:54 |
JayF | We hope https://review.opendev.org/c/openstack/ironic/+/886871 https://review.opendev.org/c/openstack/ironic/+/887008 will help | 21:54 |
JayF | all hope no knowledge | 21:54 |
JayF | lol | 21:54 |
iurygregory | JayF, perfect! | 21:54 |
iurygregory | I'm wondering about the drop of constraints in allocation, would we need to backport that? | 21:55 |
JayF | I really don't know | 21:56 |
JayF | I'm laser focused on green tests | 21:57 |
TheJulia | I wouldn't worry about backporting | 21:57 |
TheJulia | unless someone tries to pull in sqlalchemy 2.0 before we officially support it | 21:58 |
TheJulia | ooooh ahh, i postgres tests running locally | 21:58 |
JayF | clarkb: you can unhold that node for me, I squeezed all useful data outta it | 22:01 |
JayF | clarkb: if you want, you can also remove the other autohold with my name on it for now | 22:03 |
JayF | pass incoming for https://review.opendev.org/c/openstack/ironic/+/887008 if TheJulia iurygregory want to land it | 22:12 |
JayF | I see a +2, going to workflow it myself in a few minutes if nobody else does | 22:12 |
JayF | all voting jobs green on Julia's change too, we might be on the path | 22:16 |
* JayF landed that as a trivial change/ci change with one core approval | 22:19 | |
TheJulia | so, https://review.opendev.org/c/openstack/ironic/+/886865 also sort of fixes a bug, without it I get a nice error about the migration test trying to run that with mysql | 22:21 |
TheJulia | which of course, detonates | 22:22 |
TheJulia | that is when I explicitly ask for them to run | 22:22 |
JayF | yeah I got it too | 22:22 |
JayF | but that one is failing tests right now | 22:22 |
JayF | in the check queue | 22:22 |
TheJulia | gah!!!! | 22:27 |
opendevreview | Merged openstack/ironic master: Fully ensure counts are out of scope of cxtmgr https://review.opendev.org/c/openstack/ironic/+/887008 | 22:31 |
JayF | well, it passed multiple times in a row | 22:31 |
JayF | that's a good sign | 22:31 |
JayF | that change never once failed unit tests in the gate | 22:31 |
JayF | *knock on wood* | 22:31 |
JayF | going to rebase and recheck my "uncomment the super-breaky-test" | 22:32 |
JayF | oh, that landed | 22:32 |
JayF | so we passed that too! awesome | 22:32 |
TheJulia | oh, so here is a interesting aspect | 22:32 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Skip tests that fail occassionally in CI https://review.opendev.org/c/openstack/ironic/+/886911 | 22:33 |
TheJulia | locally, the migration test takes a fraction of the time it does in ci | 22:33 |
iurygregory | I've +2 https://review.opendev.org/c/openstack/ironic/+/885797/ | 22:33 |
JayF | landing it | 22:33 |
TheJulia | I guess so many direct io syncs out, that in a test VM it grinds and my ye olde code farm desktop, it is just nowhere near as fast | 22:33 |
JayF | good call out | 22:33 |
JayF | yep | 22:33 |
iurygregory | TheJulia, this would be expected *I think* CI we won't have a lot of resources | 22:33 |
TheJulia | but ci right now test_walk_version is 44 seconds | 22:34 |
JayF | Clif's first question when he started rubber ducking this with me was "are the CI nodes well resourced?" | 22:34 |
iurygregory | hummm | 22:34 |
TheJulia | yeah, just an example | 22:34 |
JayF | after my three hours of laughing we started troubleshooting | 22:34 |
JayF | we are our own noisy neighbor | 22:34 |
iurygregory | *well resourced* | 22:34 |
iurygregory | :D | 22:34 |
JayF | the first devoppy thing I ever did was setup a full LAMP stack in a 32mb linode | 22:34 |
JayF | now I can't even test software in eight gigabytes :P | 22:35 |
TheJulia | ahh, but that lamp stack took 26 mb and used swap ;) | 22:35 |
JayF | I actually had it not swapping | 22:36 |
JayF | but DB caches were near-universally disabled so it killed I/O anyway | 22:36 |
JayF | linode back in those days would actually monitor swap usage and rate limit it at the hypervisor level | 22:36 |
JayF | so if you had a regularly swapping VM, eventually it'd run out of rate limits and I/O would slow to a crawl | 22:36 |
JayF | pretty cool thing TBH; but those were user mode linux VMs, and were very slow :P | 22:36 |
JayF | I sure hope 886865,2 didn't have my fix in it (I don't think it did), it has a TIMED OUT unit test | 22:44 |
JayF | TheJulia: with my fix > 2023-06-26 22:43:51.725855 | ubuntu-focal | {6} ironic.tests.unit.db.sqlalchemy.test_migrations.TestMigrationsMySQL.test_walk_versions [17.414814s] ... ok | 22:50 |
JayF | that's on py38, assumedly the worst case | 22:50 |
JayF | 33.96s on py310 | 22:51 |
JayF | maybe it is just more environmental than anything else | 22:51 |
JayF | still not as bad as 44s | 22:51 |
JayF | https://review.opendev.org/c/openstack/ironic/+/886911 passed check, to disable the other flakey tests | 22:52 |
iurygregory | ack | 22:55 |
* JayF ==> EOD | 22:59 | |
clarkb | JayF: done I've aksed zuul to clean them up | 23:18 |
NobodyCam | crazy question. anyone know if meta-data service (169.254.169.254) should work in rescue mode? I have it working in normal deployed image, but not in rescue mode | 23:29 |
TheJulia | ... I guess it mgiht | 23:45 |
TheJulia | wellit should | 23:45 |
TheJulia | well | 23:45 |
TheJulia | It depends on *how* your configured I guess | 23:45 |
TheJulia | should it, likely not but there is no requirement there | 23:46 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!