@newbie23:matrix.org | HI guys, will a Zuul pipeline rename cause the existing queue to be emptied? What about a get pipeline? | 06:49 |
---|---|---|
-@gerrit:opendev.org- Peter Strunk proposed: [zuul/zuul] 903808: zuul_stream: add FQCN for windows command and shell https://review.opendev.org/c/zuul/zuul/+/903808 | 06:52 | |
@clarkb:matrix.org | I don't think you can just change the name of a pipeline because zuul verifies the config is valid. This means no project can be using the pipeline when you change its name. So yes I think it will be emptied first, but due to you needing to make the config valid along the way | 15:23 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253 | 16:04 | |
@clarkb:matrix.org | corvus: ^ I noticed that the exception is an error connecting to 'localhost' which I think is the local socket not the tcp connection to 127.0.0.1. The fixture defaults to 127.0.0.1 and I don't see anywhere where we might set that to localhost. But I added some extra logging to see if maybe this is some local state issue with the connection target | 16:05 |
@clarkb:matrix.org | its possible that the mysql packages on ubuntu allow connections to localhost on the socket but mariadb does not by default? | 16:05 |
@clarkb:matrix.org | and that could explain the problem | 16:05 |
@jim:acmegating.com | wouldn't that affect all the tests? this is a sporadic failure, right? | 16:08 |
@clarkb:matrix.org | yes it is a sporadic failure. I'm wondering if something is modifying the environment to use localhost or maybe localhost is the default connection host and its falling back? | 16:09 |
@clarkb:matrix.org | but an actual connection to localhost that mysql allows and mariadb doesn't would explain the error I think | 16:10 |
@jim:acmegating.com | got it | 16:14 |
@clarkb:matrix.org | digging around on the held node I actually can successfully connect with `mysql -u openstack_citest -h localhost -p` | 17:03 |
@clarkb:matrix.org | looks like this is less likely to be the issue | 17:04 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253 | 17:56 | |
@clarkb:matrix.org | pymysql debugs to stdout when the DEBUG attribute is set to True | 17:59 |
@clarkb:matrix.org | a bit unorthodox but I've toggled that around connection setup and maybe that will shed more light on this | 18:00 |
@clarkb:matrix.org | also I think every error has been on noble so far. python3.11 and ubuntu jammy haven't hit it with mariadb 10.11 | 18:00 |
-@gerrit:opendev.org- Clark Boylan proposed: | 18:12 | |
- [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253 | ||
- [zuul/zuul] 945387: DNM does using mysql unix socket work better than tcp https://review.opendev.org/c/zuul/zuul/+/945387 | ||
@clarkb:matrix.org | that was even more verbose than I anticipated. This should make it quieter and only show the stdout on failing jobs | 18:12 |
@clarkb:matrix.org | * that was even more verbose than I anticipated. This should make it quieter and only show the stdout on failing test cases | 18:12 |
@clarkb:matrix.org | fyi mnaser pointed out https://github.com/pypa/setuptools/pull/4870 I think that zuul is ok doing a quick check of its setup.cfg but nodepool's isn't | 18:47 |
@clarkb:matrix.org | https://zuul.opendev.org/t/zuul/stream/3e52604eeab14c34ba972e755854e60e?logfile=console.log this shows that we're getting the error from mariadb before we even attempt to authenticate. We're getting the error during the read server info step | 19:06 |
@clarkb:matrix.org | it seems like despite tools/test-setup.sh setting up rules that allow access for openstack_citest from `%` the server is deciding occasionally that this shouldn't be allowed | 19:07 |
@jim:acmegating.com | is that still feeling like some kind of internal mariadb race/lock issue? | 19:11 |
@clarkb:matrix.org | yes. I don't think the flush privileges we do is necessary in per test db setup. So next round I'm going to drop that and see if it helps. Maybe when flushing privileges there is a race with new connections and clearing out the old config? | 19:14 |
@clarkb:matrix.org | since we have multiple test runners all flushing privileges each time they create a new db that feels like ti could be relayed | 19:17 |
@clarkb:matrix.org | * since we have multiple test runners all flushing privileges each time they create a new db that feels like it could be related | 19:17 |
@clarkb:matrix.org | ` Fatal Python error: take_gil: PyCOND_WAIT(gil->cond) failed` is a new one. That showed up in the last remaining test job for my current stack of debugging changes | 20:12 |
@clarkb:matrix.org | I think the job is going to timeout shortly | 20:13 |
-@gerrit:opendev.org- Clark Boylan proposed: | 20:14 | |
- [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253 | ||
- [zuul/zuul] 945387: DNM does using mysql unix socket work better than tcp https://review.opendev.org/c/zuul/zuul/+/945387 | ||
@mordred:waterwanders.com | Clark: if you use grant statements you should not need flush privileges. that's typically only needed if you grant privs by doing direct sql on the mysql access tables | 20:18 |
@mordred:waterwanders.com | aha: ```DELETE FROM mysql.user WHERE User='';``` - that'll do it. are they _still_ shipping that user by default? | 20:23 |
@mordred:waterwanders.com | Clark: I left a comment on that DNM patch about something to look at. just fwiw | 20:33 |
@clarkb:matrix.org | mordred: ya I left the flush in test-setup.sh because ti only runs once with no contention. I think mysql.user is an alias table for the real thing. And re myisam the idea is to force innodb in our own schemas iirc | 20:43 |
@mordred:waterwanders.com | cool | 20:43 |
@clarkb:matrix.org | but ya maybe this is a side effect of some initial setup though I think the stuff for openstack_citest is correct and that is teh account we're failing to use (though it fails before we even provide any account info just when the client tries to read back server info) | 20:44 |
@clarkb:matrix.org | I'll get a link to that | 20:44 |
@clarkb:matrix.org | https://zuul.opendev.org/t/zuul/build/3e52604eeab14c34ba972e755854e60e/log/job-output.txt#4296-4312 you can see from that pymysql debug output it fails almost immediate upon connecting | 20:45 |
@clarkb:matrix.org | the traceback is most recent call first then on the right its the packet dump for the server response | 20:45 |
@mordred:waterwanders.com | oh yeah - I see you removed the flush from the code itself. there's another one in the cleanup method (comment left) | 20:45 |
@clarkb:matrix.org | oh yup. Is it safe to not flush privileges after the two drop statements? | 20:46 |
@clarkb:matrix.org | must be equiavlent to create user so I think it is | 20:47 |
@clarkb:matrix.org | I'll get a new ps up once I have more data collected from this round of tests | 20:47 |
@mordred:waterwanders.com | it should be safe to not do it there, yeah | 20:47 |
@mordred:waterwanders.com | but yeah - that seems really weird - I would expect flush privs to be atomic anyway, but maybe it's not? | 20:48 |
@clarkb:matrix.org | it wouldn't surprise me if it is atomic when viewed via sql but maybe the connection management is doing magic and breaking that? | 20:48 |
@clarkb:matrix.org | but ya it is really hard to explain why connections from localhost would be refused very occasionally and those create/drop user and flush privileges are the only real place we're modifying the db state around that | 20:49 |
@clarkb:matrix.org | mordred: other interesting info is this happens regularly on ubuntu jammy with mariadb 10.6 and ubuntu noble with mariadb 10.11 but I haven't caught debian bookworm mariadb 10.11 doing it yet. Also mysql on ubuntu doesn't do it | 20:50 |
@clarkb:matrix.org | its possible this is related to distro specific mariadb configs maybe as well | 20:50 |
@clarkb:matrix.org | I'm just slowly trying to add more debugging and rule things out as I go | 20:50 |
@mordred:waterwanders.com | maybe. ugh. ... are we setting ZUUL_MYSQL_HOST somewhere that I'm not seeing? | 20:50 |
@mordred:waterwanders.com | we _read_ it from the env in setUp | 20:50 |
@clarkb:matrix.org | mordred: no thats why I added extra debugging to record the self.host and self.port values | 20:50 |
@clarkb:matrix.org | I thought maybe we were and trying to connect to something invalid but it all checked out | 20:51 |
@clarkb:matrix.org | https://zuul.opendev.org/t/zuul/build/3e52604eeab14c34ba972e755854e60e/log/job-output.txt#4293 you can see that here | 20:51 |
@clarkb:matrix.org | this also seems like the sort of bug that most people would never notice if it was a race with flush privileges | 20:52 |
@mordred:waterwanders.com | yeah | 20:52 |
@clarkb:matrix.org | since most peopel set up accounts once (or very infequently) then use them forever | 20:52 |
@mordred:waterwanders.com | fascinating that it translates 127.0.0.1 to localhost in the error message | 20:53 |
-@gerrit:opendev.org- Aurelio Jargas proposed: [zuul/zuul] 945405: Test role ensure-python-command https://review.opendev.org/c/zuul/zuul/+/945405 | 20:54 | |
@clarkb:matrix.org | `select * from information_schema.processlist;` shows localhost:portnumber too | 20:54 |
@clarkb:matrix.org | rather than 127.0.0.1:portnumber | 20:54 |
@clarkb:matrix.org | when you use the socket it just shows localhost | 20:54 |
@mordred:waterwanders.com | yeah. that's the original distinction - 127.0.0.1 is for tcp, localhost is for socket. it would just be nice to not normalize that in the logging | 20:55 |
@clarkb:matrix.org | ++ | 20:56 |
@aureliojargas:matrix.org | Good idea. Done in https://review.opendev.org/c/zuul/zuul/+/945405. Is an empty change enough or do I have to explicitly call some job? | 20:57 |
@clarkb:matrix.org | An empty change is probably enough you can see the jobs enqueud for your change at https://zuul.opendev.org/t/zuul/status and many of them should run nox | 20:59 |
@clarkb:matrix.org | you'll then be able to look at the log files for those jobs and check that the ensure-nox role ran through your proposed system and confirm it works that way in addition to the built in tests | 20:59 |
@aureliojargas:matrix.org | nice, thanks! | 21:00 |
@clarkb:matrix.org | Aurelio Jargas: https://zuul.opendev.org/t/zuul/build/fb233076657a4f6194d23bbbf00c2464/log/job-output.txt#377-434 I think this is showing what I described | 21:04 |
@aureliojargas:matrix.org | In job `zuul-build-python-release` the new role is also being called via `ensure-pyproject-build`: https://zuul.opendev.org/t/zuul/build/0d4a5280b6e5448fad506b39e65c4355/log/job-output.txt#571-606 | 21:14 |
-@gerrit:opendev.org- Clark Boylan proposed: | 21:41 | |
- [zuul/zuul] 945234: Try to use mariadb in unittest again https://review.opendev.org/c/zuul/zuul/+/945234 | ||
- [zuul/zuul] 945413: Make MySQL test fixture more robust https://review.opendev.org/c/zuul/zuul/+/945413 | ||
-@gerrit:opendev.org- Clark Boylan proposed: | 21:45 | |
- [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253 | ||
- [zuul/zuul] 945387: DNM does using mysql unix socket work better than tcp https://review.opendev.org/c/zuul/zuul/+/945387 | ||
@clarkb:matrix.org | so I think the mysql fixture updates I'm making as part of testing this are valid for mysql or mariadb even if we don't switch. So I've gone ahead and pushed that inot a base change that we can land if all continues to look well | 21:48 |
@clarkb:matrix.org | then we can keep rerunning tests against mariadb to see if that fixed it. The last run had failures but none from connection issues. but we may have gotten lucky so more passes are good | 21:48 |
@clarkb:matrix.org | I broke the linters. But the code is valid code os I'll let it run and report before pushing a fix | 21:50 |
@clarkb:matrix.org | ok that stack is looking a lot happier now. Maybe it is flush privileges. That seems like something that hsouldnt' break us but a lot of jobs have succeeded and those that failed did so on other issues | 23:02 |
@clarkb:matrix.org | I'll wait for them all to report before pushing the linter fix so that we've got a record of that. Anyone still in contact with LinuxJedi I wonder if this is the sort of behavior mariadb would be interested in hearing about | 23:02 |
@jim:acmegating.com | thanks Clark ! | 23:03 |
@mordred:waterwanders.com | Clark: LinuxJedi is no longer with mariadb - he's at WolfSSL doing embedded crypto for microcontrollers | 23:25 |
@clarkb:matrix.org | ah TIL | 23:28 |
@clarkb:matrix.org | in that case linuxjedi is unlikely to care too much | 23:28 |
@mordred:waterwanders.com | that said - I still know a few folks there. Would a summary of the issue be "On Mariadb, having multiple clients each running "flush privileges" as part of their business occasionally causes a new connection to fail with a host unauthorized error" | 23:29 |
@clarkb:matrix.org | that seems to be what we're converging on | 23:30 |
@clarkb:matrix.org | the use case here is test cases create a specific schema and user for use for that test and we can have up to ~6 test cases running at once. So occasionally it is possible a flush privileges overlaps with a new connection attempt | 23:30 |
@mordred:waterwanders.com | "This is behavior we've seen in a test farm with parallel tests running flush privs as part of their test setup. We've only seen it on mariadb and not on mysql. Removing the flush privs (it wasn't needed anyway) has removed the intermittent auth errors" | 23:30 |
@clarkb:matrix.org | removing flush privileges has made the error go away so far. It isn't deterministic so I can't say for sure yet that this is definitive | 23:31 |
@clarkb:matrix.org | they may have other ideas | 23:31 |
@clarkb:matrix.org | oh and we're being rejected in early connection setup when the client requests server info before we attempt to authenticate | 23:31 |
@mordred:waterwanders.com | yah - that's consistent with host-based acl error message | 23:32 |
@mordred:waterwanders.com | it's saying "nobody from localhost is authorized" - so it doesn't even get to user/password stuff yet | 23:33 |
@clarkb:matrix.org | seen with mariadb 10.6 on ubuntu jammy and 10.11 on ubuntu noble. Ubuntu's mysql 8 package on jammy and noble has been fine | 23:34 |
@clarkb:matrix.org | ya | 23:34 |
@clarkb:matrix.org | that is all distro packaged mariadb and mysql to be clear | 23:34 |
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 944274: Only consider live items on dequeue event https://review.opendev.org/c/zuul/zuul/+/944274 | 23:40 | |
@mordred:waterwanders.com | Clark: I have passed on information to one of the Mariadb support engineers who loves weird bugs | 23:43 |
@mordred:waterwanders.com | I thought briefly about submitting a bug - but that woudl require signing up for an account on a jira, and I don't care THAT much. So I've done bug-report-as-linkedin-message - and I think my transformation to corporate drone should be considered complete | 23:44 |
@clarkb:matrix.org | ha | 23:44 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 945418: Move zuul-nox-py311-multi-scheduler to the experimental pipeline https://review.opendev.org/c/zuul/zuul/+/945418 | 23:48 | |
-@gerrit:opendev.org- Clark Boylan proposed: | 23:48 | |
- [zuul/zuul] 945413: Make MySQL test fixture more robust https://review.opendev.org/c/zuul/zuul/+/945413 | ||
- [zuul/zuul] 945234: Try to use mariadb in unittest again https://review.opendev.org/c/zuul/zuul/+/945234 | ||
- [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253 | ||
- [zuul/zuul] 945387: DNM does using mysql unix socket work better than tcp https://review.opendev.org/c/zuul/zuul/+/945387 | ||
@clarkb:matrix.org | ok that should pass linting now and appears to be safe for mysql too. So I think we should land 945413 either way then look at landing 945234 if testing continues to look happy | 23:49 |
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 945104: Fix promote event handling for github changes https://review.opendev.org/c/zuul/zuul/+/945104 | 23:55 | |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 945419: Add logging of kubectl process group id to executor https://review.opendev.org/c/zuul/zuul/+/945419 | 23:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!