iurygregory | seems like is not finding the cirros image .-. | 00:08 |
---|---|---|
TheJulia | Gah, so devstack broke us? | 00:11 |
iurygregory | don't think so | 00:16 |
iurygregory | seems like we can't find the cirros image | 00:16 |
iurygregory | but grenade can find and we pin to the same version | 00:16 |
iurygregory | https://zuul.opendev.org/t/openstack/build/3db0d75c34274ba68c86ea0858cb1cd4/log/job-output.txt#33887 | 00:17 |
iurygregory | IRONIC_IMAGE_NAME=non-existent-image | 00:17 |
iurygregory | what?! | 00:18 |
iurygregory | we probably have something tricking us... | 00:18 |
iurygregory | ok, seems like the job is trying to pull 0.5.2 ... | 00:19 |
TheJulia | I’m at dinner, but I would go look at the change history for devstack | 00:21 |
iurygregory | yeah | 00:22 |
iurygregory | there was a change 13h ago | 00:22 |
iurygregory | to use cirros 0.6.2 | 00:22 |
iurygregory | the strange thing is that I see 0.5.2 in the bfv job, so there is something weird | 00:22 |
iurygregory | but I will try a patch changing to 0.6.2 to see how it goes | 00:23 |
iurygregory | ok, I was looking at stable and master... | 00:29 |
* iurygregory checking one at a time | 00:29 | |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic master: Fix IRONIC_IMAGE_NAME=non-existent-image https://review.opendev.org/c/openstack/ironic/+/886790 | 01:05 |
iurygregory | TheJulia, hopefully this will do | 01:05 |
iurygregory | but I'm not 100% sure about the impact on us, just a workaround so we can have our CI "working" | 01:06 |
TheJulia | lgtm, if you want I'll even +A it ;) | 01:06 |
TheJulia | then again, we should wait on CI | 01:06 |
iurygregory | yeah | 01:07 |
iurygregory | will be good to have +A, but let's see how it goes | 01:07 |
TheJulia | Anyway, I'm home from dinner, I'm going to go exercise, ping me if it has results in the next few hours if your still around | 01:07 |
TheJulia | I'll try to check it before going to bed | 01:07 |
iurygregory | ack | 01:07 |
iurygregory | I can also ping stevebaker[m] XD | 01:07 |
* stevebaker[m] is summoned | 01:08 | |
TheJulia | heh | 01:09 |
iurygregory | it worked XD | 01:09 |
iurygregory | it seems it's also affecting stable... | 01:09 |
iurygregory | yay? | 01:09 |
iurygregory | because the change is in tempest... | 01:10 |
TheJulia | maybe revert the change? | 01:13 |
TheJulia | then again, maybe that is too much drama | 01:14 |
iurygregory | yeah | 01:15 |
iurygregory | let's see what they think about and discuss, for now let's have a workaround | 01:15 |
iurygregory | ok, now time to go back to the RedfishFirmwareInterface | 01:29 |
iurygregory | greennnnn | 01:58 |
iurygregory | <happy dance> | 01:58 |
TheJulia | woot | 02:00 |
iurygregory | https://zuul.opendev.org/t/openstack/status#886790 | 02:10 |
opendevreview | Verification of a change to openstack/ironic master failed: Fix IRONIC_IMAGE_NAME=non-existent-image https://review.opendev.org/c/openstack/ironic/+/886790 | 06:10 |
opendevreview | Merged openstack/ironic master: Fix IRONIC_IMAGE_NAME=non-existent-image https://review.opendev.org/c/openstack/ironic/+/886790 | 09:01 |
iurygregory | good morning Ironic | 10:48 |
TheJulia | Good morning | 13:10 |
opendevreview | Verification of a change to openstack/ironic master failed: Add DB API for Firmware and Object https://review.opendev.org/c/openstack/ironic/+/883062 | 14:59 |
JayF | at least it's not unit test/migrations failures | 15:00 |
TheJulia | I was sort of afraid to look | 15:05 |
TheJulia | Okay… | 15:07 |
JayF | I have a node being held next time our unit tests failed | 15:07 |
JayF | my afternoon is going to be getting on that node, trying to repro on demand, and then debugging from there | 15:07 |
TheJulia | ++ | 15:07 |
JayF | ...that hold hasn't caught anything yet | 15:08 |
JayF | lolsob | 15:08 |
JayF | I'm about to go to my fri morning meetings, maybe I need to recheck the world and hope I catch something lol | 15:09 |
TheJulia | Enjoy, at breakfast and when I get back my plan is to do a ton of rebasing | 15:10 |
opendevreview | Verification of a change to openstack/ironic master failed: Add DB API for Firmware and Object https://review.opendev.org/c/openstack/ironic/+/883062 | 16:13 |
opendevreview | Julia Kreger proposed openstack/ironic master: Utilize the JSON-RPC port https://review.opendev.org/c/openstack/ironic/+/879215 | 16:17 |
TheJulia | some really weird failures the past few days | 16:43 |
TheJulia | like... bizzar | 16:43 |
iurygregory | I like the fact we get the message twice lol | 16:57 |
iurygregory | the second was because py38-arm64 failed | 16:57 |
opendevreview | Julia Kreger proposed openstack/ironic master: Disable spanning tree https://review.opendev.org/c/openstack/ironic/+/886863 | 17:12 |
opendevreview | Verification of a change to openstack/ironic master failed: Revert "Disabling test_upgrade_twice temporarily for CI fix" https://review.opendev.org/c/openstack/ironic/+/886196 | 17:36 |
TheJulia | oh wow | 17:46 |
TheJulia | I've randomly reproduced the test_migrations failure | 17:46 |
iurygregory | I like the part "randomly " | 17:47 |
iurygregory | :D | 17:47 |
iurygregory | because that's exactly how our CI is reproducing the failures :D | 17:48 |
TheJulia | well | 17:49 |
TheJulia | I was like "we keep worrying about the WAL pragma change" so I was like "I'll just bypass it for unit tests! | 17:49 |
TheJulia | and *boom* | 17:49 |
TheJulia | everything else passes with it disabled, except the a migration check | 17:49 |
iurygregory | my patch just failed in test_models_sync | 17:50 |
iurygregory | https://afc51e22935c842dcb5d-192c01edda8011523fd78225d02322dc.ssl.cf2.rackcdn.com/883062/16/check/openstack-tox-py38/79d5289/testr_results.html | 17:50 |
iurygregory | pymysql.err.OperationalError: (1051, "Unknown table 'wrcdmblqhg.conductors'") | 17:51 |
iurygregory | O.o | 17:51 |
iurygregory | I really don't like this | 17:52 |
iurygregory | alembic.autogenerate.compare_metadata(mc, self.get_metadata())) | 17:52 |
iurygregory | }}} | 17:52 |
iurygregory | ops | 17:52 |
iurygregory | https://paste.opendev.org/show/b2A4BnIY8yzVEjFn0gkA/ | 17:52 |
TheJulia | https://paste.opendev.org/show/820446/ <-- might actually be different | 17:53 |
TheJulia | and of course, I actually have mysql running | 17:54 |
TheJulia | yeah, I just spotted that too | 17:56 |
opendevreview | Julia Kreger proposed openstack/ironic master: Fix SQLAlchemy engine connection listener https://review.opendev.org/c/openstack/ironic/+/885797 | 17:57 |
opendevreview | Julia Kreger proposed openstack/ironic master: Disable WAL Pragma for Unit Testing https://review.opendev.org/c/openstack/ironic/+/886865 | 17:57 |
TheJulia | oh, I sort of see what is going on | 18:01 |
TheJulia | we're basically requiring the same field to be a unique constraint across tow object models where one demands the other | 18:01 |
TheJulia | s/tow/two/ | 18:01 |
iurygregory | '-' | 18:01 |
iurygregory | in alocations and nodes? | 18:02 |
TheJulia | yeah | 18:04 |
TheJulia | both ultimately have a constraint on conductors.id | 18:04 |
TheJulia | for the same value in different tables | 18:04 |
TheJulia | and Allocations may be disjointed from nodes, so it is sort of a weird relationship | 18:05 |
TheJulia | running unit tests to see if I've removed the SAWarning | 18:05 |
iurygregory | fingers crossed | 18:07 |
TheJulia | the migrations error is | 18:08 |
TheJulia | weird | 18:08 |
TheJulia | iurygregory: do we have anything like what I got in CI that you've seen recently? | 18:08 |
NobodyCam | Good morning Ironic folks, and ofc... | 18:12 |
NobodyCam | TGIF | 18:12 |
TheJulia | indeed | 18:12 |
TheJulia | iurygregory: nope, didn't get rid of it | 18:12 |
JayF | TheJulia: did you just fix unit tests? or is this something different | 18:12 |
NobodyCam | o/ TheJulia | 18:12 |
JayF | damn | 18:12 |
JayF | I shouldn't have typed the jinx in | 18:12 |
TheJulia | heh | 18:12 |
NobodyCam | morning JayF | 18:12 |
TheJulia | well, looks like a few different things going on | 18:13 |
TheJulia | I only fixed the model | 18:13 |
TheJulia | oh... that is why | 18:13 |
iurygregory | I have some links of the random failures on my patch if you want TheJulia | 18:16 |
iurygregory | the last one was on test_models_sync https://afc51e22935c842dcb5d-192c01edda8011523fd78225d02322dc.ssl.cf2.rackcdn.com/883062/16/check/openstack-tox-py38/79d5289/testr_results.html | 18:16 |
TheJulia | looks like we need to actually delete the constraint from the db schema | 18:16 |
iurygregory | interesting | 18:17 |
iurygregory | https://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-py38&job_name=openstack-tox-py39&job_name=openstack-tox-py310&project=openstack%2Fironic&branch=master&result=FAILURE&skip=0 | 18:20 |
iurygregory | a list of failed jobs *py3* | 18:20 |
iurygregory | I'm leaving early today, since I was working till 1:30am | 18:23 |
iurygregory | but I will be checking patches later today o/ | 18:23 |
* TheJulia tries to figure out what the contsraint name *actually* is | 18:38 | |
JayF | fungi: https://zuul.opendev.org/t/openstack/nodes I don't see a node held for me, but https://review.opendev.org/c/openstack/ironic/+/883062 just failed | 18:50 |
JayF | fungi: I wonder if the autohold is not great | 18:50 |
JayF | (you were worried about the filter you were using iirc) | 18:50 |
TheJulia | interesting the migration tests alone take 70 seconds on my desktop | 18:52 |
fungi | JayF: yeah, i suspect the job name can't be wildcarded and we'll need to pick a specific job | 19:09 |
JayF | can we hold one of each, maybe? then I can get you to drop the autohold/held nodes when/if we catch one | 19:10 |
fungi | or set multiple autoholds and then cancel the others after one catches something | 19:10 |
JayF | that's what I'm trying to suggest | 19:10 |
fungi | yeah, our comments crossed in the aether | 19:10 |
TheJulia | okay got a fix for the sawarning | 19:28 |
TheJulia | it *basically* is both constraints | 19:28 |
TheJulia | we can't have mappings going both ways at the same time, basically | 19:28 |
TheJulia | well... nope | 19:32 |
fungi | is it mostly job timeouts or failed results you're tracking down? | 19:41 |
fungi | JayF: is the list of jobs you're interested in catching failures for still the same as earlier in the week? | 19:45 |
fungi | also note that we can't limit autoholds to specific pipelines, so if it's a job that also runs in check and someone uploads an obviously broken patch then we may end up holding a node for that | 19:47 |
fungi | which is why we often limit it to a specific change and just recheck that in order to reproduce failures | 19:47 |
opendevreview | Julia Kreger proposed openstack/ironic master: Handle SAWarning around allocations FK Constratins https://review.opendev.org/c/openstack/ironic/+/886871 | 19:53 |
TheJulia | generally across most of our runs we've got a mysterious hang occuring, and I believe that is what JayF is tryign to hunt down | 19:58 |
TheJulia | specifically with openstack-tox-py39 | 19:59 |
JayF | and 38 | 19:59 |
JayF | I've seen it on 310 too | 19:59 |
JayF | fungi: lets go for https://review.opendev.org/c/openstack/ironic/+/886196 | 20:01 |
JayF | fungi: and TIMED_OUT or FAILURE is fine to catch, I think at this point we've fixed the timeouts so these are actual-failures now | 20:01 |
fungi | well, i was asking what job results you were investigating in order to attempt to reverse engineer a list of failing job names from the zuul builds dashboard, the autohold feature doesn't allow for matching on job result | 20:02 |
JayF | ah, yeah use that change | 20:02 |
JayF | it reverts the commenting out of the most fail-y test | 20:03 |
JayF | so you get better chances of failure | 20:03 |
JayF | and I will recheck that one :D | 20:03 |
fungi | but if we're going to scope the autohold to change 886196 then i can just leave out the job name field and catch any unsuccessful job for it | 20:03 |
TheJulia | JayF: does https://paste.opendev.org/show/820446/ look familiar ? | 20:03 |
JayF | that is how the failures present now that rpittau put in timeouts | 20:03 |
JayF | that is "the failure" | 20:03 |
JayF | because the fixture timeout fires and uncleanly shuts down the sql conn | 20:04 |
fungi | never mind, i'm wrong. job name is still required (and has to be an exact match). i'll set it for openstack-tox-py38 | 20:05 |
fungi | | 0000000220 | openstack | opendev.org/openstack/ironic | openstack-tox-py38 | refs/changes/96/886196/.* | 1 | JayF investigating database migration hangs/timeouts | | 20:06 |
JayF | I'm simultaneously pleased and annoyed lol | 20:06 |
JayF | how do I gain access to the machine? | 20:07 |
fungi | that's not a machine, that's an autohold | 20:07 |
TheJulia | JayF: you ask when the node has been caught | 20:07 |
TheJulia | thing of it as like fishing for a failing CI node | 20:07 |
TheJulia | and the autohold is the hook | 20:07 |
fungi | once you get openstack-tox-py38 to fail on 886196 i'll add your ssh public key as allowed for authenticating to the root account on the job node we end up with | 20:08 |
TheJulia | so once caught, talk to fungi :) | 20:08 |
JayF | I thought my recheck immedaitely caught one | 20:08 |
JayF | which is why I would've been sad lol | 20:08 |
JayF | I'll recheck that through the weekend, I'm sure it'll catch a failure | 20:08 |
JayF | and if not ... good? maybe? | 20:08 |
opendevreview | Julia Kreger proposed openstack/ironic master: Handle SAWarning around allocations FK Constratins https://review.opendev.org/c/openstack/ironic/+/886871 | 20:50 |
TheJulia | i think I have reproduced the hang | 21:19 |
TheJulia | .... | 21:19 |
TheJulia | https://paste.opendev.org/show/820448/ | 21:21 |
TheJulia | connection 316 is me trying to dump the db | 21:21 |
TheJulia | since the nodes table is gone | 21:21 |
JayF | drop table conductors;? wtf | 21:22 |
TheJulia | it is the test cleaning up | 21:22 |
TheJulia | but even then, that is locked | 21:22 |
TheJulia | or blocked, to be precise | 21:22 |
JayF | | 316 | root | localhost | rhakocwbbm | Query | 73 | Waiting for table metadata lock | LOCK TABLES `conductors` READ /*!32311 LOCAL */,`deploy_template_steps` READ /*!32311 LOCAL */,`deploy_templates` READ /*!32311 LOCAL */,`firmware_information` READ /*!32311 LOCAL */,`node_history` READ /*!32311 LOCAL */,`node_inventory` READ /*!32311 LOCAL | 21:22 |
JayF | */,`node_tags` READ /*!32311 LOCAL */,`node_traits` READ /*!32311 LOCAL */,`nodes` READ /*!32311 LOCAL */,`portgroups` READ /*!32311 LOCAL */,`ports` READ /*!32311 LOCAL */,`volume_connectors` READ /*!32311 LOCAL */,`volume_targets` READ /*!32311 LOCAL */ | 0.000 | | 21:22 |
JayF | that's the real winner | 21:22 |
JayF | holy paste | 21:22 |
TheJulia | that is me trying to mysqldump | 21:22 |
JayF | oh okay | 21:22 |
JayF | so it's deadlocked but nothing in the process list | 21:22 |
JayF | what the hell | 21:22 |
JayF | SHOW ENGINE INNODB STATUS \G | 21:23 |
JayF | look for transactions in that list | 21:23 |
TheJulia | I've got 22 nodes in the db | 21:23 |
JayF | .o(is this even innodb?) | 21:23 |
TheJulia | it is | 21:23 |
JayF | select from performance_schema.metadata_locks might help too | 21:23 |
TheJulia | if you have mysql installed | 21:23 |
TheJulia | the tests use it | 21:23 |
JayF | IDK if my comments are helpful, I have no idea how much of this stuff you already know, my assumption is all of it but I don't wanna not say and something get missed lol | 21:24 |
TheJulia | i don't have performance_schema | 21:24 |
JayF | I'm excited you have a live one with the locked db | 21:24 |
JayF | anyhting in the innodb engine status? | 21:24 |
JayF | particularly hung txns? | 21:24 |
TheJulia | not really | 21:26 |
JayF | what mariadb/mysql version? | 21:26 |
TheJulia | hmm | 21:27 |
TheJulia | https://paste.opendev.org/show/bQ7UYpuZOyA3VufQAlfn/ | 21:27 |
TheJulia | 312 and 308 seem suspect | 21:27 |
JayF | if we don't solve this in short order, we need to persist these findings in the bug fwiw | 21:28 |
TheJulia | mysql Ver 15.1 Distrib 10.1.48-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2 | 21:28 |
JayF | those two threads look deadlocked against each other | 21:28 |
JayF | 312/308 | 21:28 |
JayF | when would we even have, in CI, two overlapping txns? | 21:28 |
JayF | having two overlapping txns almost seems like itself might be the bug, yeah | 21:29 |
JayF | for purposes of migrations that at least seems scary to me, idk | 21:29 |
* TheJulia pokes aroudn the db to try and figure out what tests are actually running | 21:29 | |
JayF | probably one of the DB migrations that has to migrate data around | 21:29 |
JayF | that's my hunch at least given this | 21:29 |
TheJulia | I'm not sure about that, given I've got 22 entires in the nodes table | 21:29 |
TheJulia | one with a ton of mock data | 21:29 |
JayF | it was a guess anyway | 21:29 |
JayF | don't we have one of those migrations for node inventory though? | 21:30 |
TheJulia | model schema, not actual data afaik | 21:30 |
* JayF remembers node-something getting shifted around | 21:30 | |
TheJulia | hmmmmmmmm | 21:30 |
TheJulia | https://paste.opendev.org/show/b4Mk6wMPUZioAyDYnb0P/ | 21:32 |
JayF | node.conductor_group being not null but empty string | 21:33 |
JayF | looks really really weird to me | 21:33 |
TheJulia | https://www.irccloud.com/pastebin/dWVlf1Mf/ | 21:34 |
JayF | I'll also note, you likely can't see whatever transformation the txn was trying to perform | 21:34 |
TheJulia | id 123 seems like a huge issue | 21:34 |
JayF | until/unless you kill one and free the deadlock | 21:34 |
JayF | or you interrupted a migration | 21:34 |
JayF | and that is being populated | 21:34 |
JayF | but ... it should never be in this state | 21:34 |
JayF | if node.uuid were non-null, that should be a state that's "inside" a txn, yeah? | 21:35 |
JayF | so it should never appear when querying from cli | 21:35 |
TheJulia | https://www.irccloud.com/pastebin/VasC00Hz/ | 21:35 |
JayF | so we are likely saving an empty node somewhere | 21:35 |
JayF | yeah? | 21:35 |
TheJulia | yeah | 21:35 |
TheJulia | well, many | 21:35 |
TheJulia | most tests do skeleton nodes like these and only set the necessary details | 21:35 |
JayF | here's the $1M question: why now | 21:36 |
TheJulia | like that one is named node1, has a flat network interface, and nothing else | 21:36 |
TheJulia | so | 21:36 |
TheJulia | what is super suspect is we merged firmware information schema (sorry iury!) and it all sort of started in the wake of that | 21:36 |
TheJulia | and now we've got a test in there that has hung it seems with data still there | 21:37 |
TheJulia | *could* be a coincidence | 21:37 |
TheJulia | but worth chasing down | 21:37 |
JayF | oh oh | 21:37 |
JayF | this is | 21:37 |
JayF | it's missing the delete magic | 21:37 |
JayF | in the high level node delete | 21:37 |
TheJulia | node_id 123 missing uuid could just be a red herring | 21:37 |
* JayF has a strong idea where to look | 21:37 | |
JayF | now if I only knew where that code was LOL | 21:37 |
JayF | brain does not have an index | 21:37 |
TheJulia | oh wow | 21:38 |
TheJulia | so, yeah | 21:38 |
TheJulia | you can't delete the firmware information because it has a unique constraint | 21:38 |
TheJulia | so if it is missed, and somehow, somewhat it doesn't get picked up, yeah, i could see it deadlocking if we don't | 21:38 |
TheJulia | oh.... uhhh object helpers I *think* | 21:39 |
TheJulia | so, you *can* end up with a null uuid field on a node if one hasn't been saved out yet | 21:40 |
TheJulia | so it could be something with deadlocking already causing it to be the case | 21:40 |
JayF | node was changed without obj version being bumped | 21:41 |
TheJulia | bingo | 21:41 |
JayF | firmware_interface was added to node | 21:42 |
JayF | *NodeBase | 21:42 |
TheJulia | we lack firmware_information delete on destroy_node | 21:42 |
JayF | obj version wasn't bumped | 21:42 |
JayF | so it looks like there are a couple of pieces missing there then | 21:42 |
JayF | we shoudl probably fix both identified issues in the same PR, yeah? | 21:42 |
JayF | you want it or shoudl I take it | 21:42 |
TheJulia | they should be separate I *think* | 21:42 |
TheJulia | but let me see where my current repo is at since I hit this state trying to test a rebase | 21:43 |
TheJulia | I'm making a new clone | 21:43 |
JayF | TheJulia: I have it | 21:44 |
JayF | just let me do it, I'm almost there | 21:44 |
TheJulia | k | 21:44 |
TheJulia | so all of the migrations explain all of the nodes | 21:45 |
TheJulia | yup, and we're bombing out on the test added as part of that original change | 21:45 |
JayF | if this fixes it I might cry no joke | 21:46 |
JayF | so much stress looking for this crap | 21:46 |
TheJulia | so it looks like the db was done, model was done, but it was not added to the object, yet! | 21:46 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Fix node deletion with FirmwareInformation https://review.opendev.org/c/openstack/ironic/+/886878 | 21:46 |
TheJulia | https://review.opendev.org/c/openstack/ironic/+/883062 <-- still in flight | 21:46 |
JayF | FirmwareInformation was added to node_base though | 21:47 |
JayF | er, NodeBase already | 21:47 |
JayF | doesn't that add it to the Node object already? | 21:47 |
TheJulia | models != objects | 21:47 |
JayF | ah yeah, that's right | 21:47 |
JayF | we have separate db models and object models | 21:47 |
TheJulia | when we create the object, we copy from the db model response | 21:47 |
TheJulia | yup | 21:47 |
TheJulia | yeah, I think your change will do it | 21:48 |
JayF | like, can we unit test for *this exact case* | 21:48 |
JayF | if we can, I'm not sure how | 21:48 |
TheJulia | it would have to look at the tables which exist and then likely look at every schema to hunt down where constraints are | 21:49 |
TheJulia | would be a good challenge I guess | 21:49 |
TheJulia | I don't see a practical reason why we can't | 21:49 |
TheJulia | it just might not "pretty" | 21:49 |
JayF | I'll create a bug for it and label it good-first-issue ;) | 21:49 |
TheJulia | ++ | 21:49 |
JayF | TheJulia: I shoulda co-authored-by you on that patch, do you care enough for me to toll the CI? | 21:51 |
TheJulia | it just got started :) | 21:52 |
TheJulia | but up to you | 21:52 |
JayF | your gmail is what you use, yeah? | 21:52 |
TheJulia | re-testing locally with your patch since I hit it kind of head on | 21:52 |
TheJulia | juliaashleykreger@gmail.com | 21:52 |
JayF | yep I do remember how to spell your last name :D | 21:53 |
JayF | I should by now lmao | 21:53 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Fix node deletion with FirmwareInformation https://review.opendev.org/c/openstack/ironic/+/886878 | 21:53 |
JayF | holy hell this whole thing might be an "invisible work of openstack" blogpost all on it's own | 21:54 |
TheJulia | just remember "sometimes it is good for someone who does dev work on the old desktop in a corner which has all of the databases installed ready to rock and roll" | 21:54 |
TheJulia | I wouldn't have hit it if I didn't have mysql installed and pre-configured for jobs to run against | 21:54 |
TheJulia | and... it looks like I might have hung again | 21:55 |
TheJulia | lets see what is going on | 21:55 |
JayF | https://bugs.launchpad.net/ironic/+bug/2024941 | 21:56 |
JayF | oh no | 21:56 |
TheJulia | and we're back in a epoll loop wait on the runner | 21:56 |
* TheJulia reopens mysql | 21:56 | |
JayF | did you purge your db? | 21:56 |
TheJulia | no | 21:56 |
JayF | that is likely why it's broken | 21:56 |
JayF | still has bad data from last run | 21:56 |
TheJulia | it creates random database names by default | 21:57 |
JayF | (I hope?) | 21:57 |
JayF | wait, what does? | 21:57 |
TheJulia | the test runner | 21:57 |
TheJulia | https://www.irccloud.com/pastebin/sarIz6ki/ | 21:57 |
TheJulia | see how my db name is now lehsphwiee | 21:57 |
TheJulia | now, with force! lehsphwiee!!! | 21:58 |
JayF | https://opendev.org/openstack/ironic/src/branch/master/tools/test-setup.sh I guess I assumed we use the created db | 21:58 |
JayF | but I see we grant *.* | 21:58 |
JayF | so we can yolo all the dbs we want | 21:58 |
TheJulia | each runner gets its own db land to play in | 21:58 |
TheJulia | err, I still have firmware_information | 21:58 |
TheJulia | .... and I have another thing trying to drop conductors | 21:58 |
TheJulia | hmmmmmmmmmm | 21:59 |
JayF | having firmware_information is not wrong, it remaining after a node is deleted is | 21:59 |
TheJulia | true | 21:59 |
TheJulia | so we're deep in the migration run I guess | 22:00 |
TheJulia | https://www.irccloud.com/pastebin/95waW8te/ | 22:00 |
JayF | I'm just stunned this didn't fix it | 22:00 |
JayF | we fixed something for sure | 22:00 |
JayF | I wonder | 22:00 |
JayF | you wanna just revert firmware_informaiton | 22:00 |
JayF | and see if it's no longer repro'd? | 22:01 |
TheJulia | I'm going to comment out the migration check to start | 22:01 |
JayF | ack | 22:01 |
TheJulia | I'm wondering if there is just something in there we hang on | 22:01 |
JayF | I spent some time reading thru all the migrations, basically re-reviewing them all | 22:02 |
JayF | not now, but before | 22:02 |
JayF | and I saw nothing obvious | 22:02 |
TheJulia | it has got to be somewhere in that because the node name with a bunch of "a"'s | 22:03 |
TheJulia | which is a migration check | 22:03 |
JayF | what about parent_node | 22:03 |
JayF | it's nullable, no constraint | 22:04 |
JayF | I could set a parent_node of lol if I wanted | 22:04 |
JayF | so it's unlikely to be that, but I was hoping | 22:04 |
* TheJulia looks at the db tests iury wrote and scratches her brain for a minute going "something doesn't look right" | 22:05 | |
TheJulia | oh, nvmd | 22:05 |
TheJulia | I see it | 22:05 |
TheJulia | or at least what my brain was going "wut" with | 22:06 |
TheJulia | okay, disabled the test entirely, lets see if we hang | 22:06 |
TheJulia | did not hang | 22:09 |
JayF | can you tell me exactly what you commented? | 22:09 |
JayF | like a link woudl be idea | 22:09 |
JayF | but line no X in Y file is fine | 22:09 |
TheJulia | I just added a pass after _check_163040c5513f | 22:09 |
JayF | fyi this is the unit test bug I created, for someone to check for the not-deleted-properly method https://bugs.launchpad.net/bugs/2024941 | 22:10 |
JayF | this feels more like the bug is exposed in the test | 22:13 |
JayF | than the test itself being broken | 22:13 |
JayF | but I'm not sure | 22:13 |
TheJulia | i think the issue is in the version walking, we're injecting an artificial entry | 22:13 |
TheJulia | but I just don't understand why | 22:13 |
TheJulia | what are we achieving by that | 22:13 |
JayF | I don't see it | 22:14 |
TheJulia | https://review.opendev.org/c/openstack/ironic/+/883031/13/ironic/tests/unit/db/sqlalchemy/test_migrations.py#1316 | 22:15 |
TheJulia | I'm guessing we don't actually tear down with the dbapi here *because* that is the next level up data model wise | 22:15 |
JayF | I don't follow | 22:15 |
JayF | the initial_version on Ln 1316 is just data, right? part of that firmware_information table | 22:16 |
JayF | documenting the first version of that fw we saw, yeah | 22:16 |
JayF | and it's non-null so it has to be set to create a thing | 22:16 |
TheJulia | https://meet.google.com/vnq-drgr-xom | 22:16 |
TheJulia | easier perhaps ^ | 22:16 |
TheJulia | muahahahaha I think we found it | 22:51 |
opendevreview | Julia Kreger proposed openstack/ironic master: Fix test_migrations with firmware information. https://review.opendev.org/c/openstack/ironic/+/886881 | 23:05 |
JayF | https://review.opendev.org/c/openstack/ironic/+/886881 is the expected fix for our unit tests; I will recheck it several times over the weekend to ensure it's passing reliably. It's my intention to land this Monday morning with +2s from TheJulia and I, despite our work on it, if it has no further reviews | 23:30 |
JayF | if you think that's a terrible idea, go put a -1 on that patch, otherwise, review it and please +2 it | 23:30 |
JayF | thank you | 23:30 |
* TheJulia suspects steve may do the needful :) | 23:30 | |
TheJulia | so, anyone have feels on just obliterating the test_port_changed_client_id test ? | 23:32 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Revert "Disabling test_upgrade_twice temporarily for CI fix" https://review.opendev.org/c/openstack/ironic/+/886196 | 23:33 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!