Monday, 2025-03-24

@newbie23:matrix.orgHI guys, will a Zuul pipeline rename cause the existing queue to be emptied? What about a get pipeline?06:49
-@gerrit:opendev.org- Peter Strunk proposed: [zuul/zuul] 903808: zuul_stream: add FQCN for windows command and shell https://review.opendev.org/c/zuul/zuul/+/90380806:52
@clarkb:matrix.orgI don't think you can just change the name of a pipeline because zuul verifies the config is valid. This means no project can be using the pipeline when you change its name. So yes I think it will be emptied first, but due to you needing to make the config valid along the way15:23
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/94525316:04
@clarkb:matrix.orgcorvus: ^ I noticed that the exception is an error connecting to 'localhost' which I think is the local socket not the tcp connection to 127.0.0.1. The fixture defaults to 127.0.0.1 and I don't see anywhere where we might set that to localhost. But I added some extra logging to see if maybe this is some local state issue with the connection target16:05
@clarkb:matrix.orgits possible that the mysql packages on ubuntu allow connections to localhost on the socket but mariadb does not by default?16:05
@clarkb:matrix.organd that could explain the problem16:05
@jim:acmegating.comwouldn't that affect all the tests?  this is a sporadic failure, right?16:08
@clarkb:matrix.orgyes it is a sporadic failure. I'm wondering if something is modifying the environment to use localhost or maybe localhost is the default connection host and its falling back?16:09
@clarkb:matrix.orgbut an actual connection to localhost that mysql allows and mariadb doesn't would explain the error I think16:10
@jim:acmegating.comgot it16:14
@clarkb:matrix.orgdigging around on the held node I actually can successfully connect with `mysql -u openstack_citest -h localhost -p`17:03
@clarkb:matrix.orglooks like this is less likely to be the issue17:04
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/94525317:56
@clarkb:matrix.orgpymysql debugs to stdout when the DEBUG attribute is set to True17:59
@clarkb:matrix.orga bit unorthodox but I've toggled that around connection setup and maybe that will shed more light on this18:00
@clarkb:matrix.orgalso I think every error has been on noble so far. python3.11 and ubuntu jammy haven't hit it with mariadb 10.1118:00
-@gerrit:opendev.org- Clark Boylan proposed:18:12
- [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253
- [zuul/zuul] 945387: DNM does using mysql unix socket work better than tcp https://review.opendev.org/c/zuul/zuul/+/945387
@clarkb:matrix.orgthat was even more verbose than I anticipated. This should make it quieter and only show the stdout on failing jobs18:12
@clarkb:matrix.org* that was even more verbose than I anticipated. This should make it quieter and only show the stdout on failing test cases18:12
@clarkb:matrix.orgfyi mnaser pointed out https://github.com/pypa/setuptools/pull/4870 I think that zuul is ok doing a quick check of its setup.cfg but nodepool's isn't18:47
@clarkb:matrix.orghttps://zuul.opendev.org/t/zuul/stream/3e52604eeab14c34ba972e755854e60e?logfile=console.log this shows that we're getting the error from mariadb before we even attempt to authenticate. We're getting the error during the read server info step19:06
@clarkb:matrix.orgit seems like despite tools/test-setup.sh setting up rules that allow access for openstack_citest from `%` the server is deciding occasionally that this shouldn't be allowed19:07
@jim:acmegating.comis that still feeling like some kind of internal mariadb race/lock issue?19:11
@clarkb:matrix.orgyes. I don't think the flush privileges we do is necessary in per test db setup. So next round I'm going to drop that and see if it helps. Maybe when flushing privileges there is a race with new connections and clearing out the old config?19:14
@clarkb:matrix.orgsince we have multiple test runners all flushing privileges each time they create a new db that feels like ti could be relayed19:17
@clarkb:matrix.org* since we have multiple test runners all flushing privileges each time they create a new db that feels like it could be related19:17
@clarkb:matrix.org` Fatal Python error: take_gil: PyCOND_WAIT(gil->cond) failed` is a new one. That showed up in the last remaining test job for my current stack of debugging changes20:12
@clarkb:matrix.orgI think the job is going to timeout shortly20:13
-@gerrit:opendev.org- Clark Boylan proposed:20:14
- [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253
- [zuul/zuul] 945387: DNM does using mysql unix socket work better than tcp https://review.opendev.org/c/zuul/zuul/+/945387
@mordred:waterwanders.comClark: if you use grant statements you should not need flush privileges. that's typically only needed if you grant privs by doing direct sql on the mysql access tables20:18
@mordred:waterwanders.comaha: ```DELETE FROM mysql.user WHERE User='';``` - that'll do it. are they _still_ shipping that user by default?20:23
@mordred:waterwanders.comClark: I left a comment on that DNM patch about something to look at. just fwiw20:33
@clarkb:matrix.orgmordred: ya I left the flush in test-setup.sh because ti only runs once with no contention. I think mysql.user is an alias table for the real thing. And re myisam the idea is to force innodb in our own schemas iirc20:43
@mordred:waterwanders.comcool20:43
@clarkb:matrix.orgbut ya maybe this is a side effect of some initial setup though I think the stuff for openstack_citest is correct and that is teh account we're failing to use (though it fails before we even provide any account info just when the client tries to read back server info)20:44
@clarkb:matrix.orgI'll get a link to that20:44
@clarkb:matrix.orghttps://zuul.opendev.org/t/zuul/build/3e52604eeab14c34ba972e755854e60e/log/job-output.txt#4296-4312 you can see from that pymysql debug output it fails almost immediate upon connecting20:45
@clarkb:matrix.orgthe traceback is most recent call first then on the right its the packet dump for the server response20:45
@mordred:waterwanders.comoh yeah - I see you removed the flush from the code itself. there's another one in the cleanup method (comment left)20:45
@clarkb:matrix.orgoh yup. Is it safe to not flush privileges after the two drop statements?20:46
@clarkb:matrix.orgmust be equiavlent to create user so I think it is20:47
@clarkb:matrix.orgI'll get a new ps up once I have more data collected from this round of tests20:47
@mordred:waterwanders.comit should be safe to not do it there, yeah20:47
@mordred:waterwanders.combut yeah - that seems really weird - I would expect flush privs to be atomic anyway, but maybe it's not?20:48
@clarkb:matrix.orgit wouldn't surprise me if it is atomic when viewed via sql but maybe the connection management is doing magic and breaking that?20:48
@clarkb:matrix.orgbut ya it is really hard to explain why connections from localhost would be refused very occasionally and those create/drop user and flush privileges are the only real place we're modifying the db state around that20:49
@clarkb:matrix.orgmordred: other interesting info is this happens regularly on ubuntu jammy with mariadb 10.6 and ubuntu noble with mariadb 10.11 but I haven't caught debian bookworm mariadb 10.11 doing it yet. Also mysql on ubuntu doesn't do it20:50
@clarkb:matrix.orgits possible this is related to distro specific mariadb configs maybe as well20:50
@clarkb:matrix.orgI'm just slowly trying to add more debugging and rule things out as I go20:50
@mordred:waterwanders.commaybe. ugh. ... are we setting ZUUL_MYSQL_HOST somewhere that I'm not seeing?20:50
@mordred:waterwanders.comwe _read_ it from the env in setUp20:50
@clarkb:matrix.orgmordred: no thats why I added extra debugging to record the self.host and self.port values20:50
@clarkb:matrix.orgI thought maybe we were and trying to connect to something invalid but it all checked out20:51
@clarkb:matrix.orghttps://zuul.opendev.org/t/zuul/build/3e52604eeab14c34ba972e755854e60e/log/job-output.txt#4293 you can see that here20:51
@clarkb:matrix.orgthis also seems like the sort of bug that most people would never notice if it was a race with flush privileges20:52
@mordred:waterwanders.comyeah20:52
@clarkb:matrix.orgsince most peopel set up accounts once (or very infequently) then use them forever20:52
@mordred:waterwanders.comfascinating that it translates 127.0.0.1 to localhost in the error message20:53
-@gerrit:opendev.org- Aurelio Jargas proposed: [zuul/zuul] 945405: Test role ensure-python-command https://review.opendev.org/c/zuul/zuul/+/94540520:54
@clarkb:matrix.org`select * from information_schema.processlist;` shows localhost:portnumber too20:54
@clarkb:matrix.orgrather than 127.0.0.1:portnumber20:54
@clarkb:matrix.orgwhen you use the socket it just shows localhost20:54
@mordred:waterwanders.comyeah. that's the original distinction - 127.0.0.1 is for tcp, localhost is for socket. it would just be nice to not normalize that in the logging20:55
@clarkb:matrix.org++20:56
@aureliojargas:matrix.orgGood idea. Done in https://review.opendev.org/c/zuul/zuul/+/945405. Is an empty change enough or do I have to explicitly call some job?20:57
@clarkb:matrix.orgAn empty change is probably enough you can see the jobs enqueud for your change at https://zuul.opendev.org/t/zuul/status and many of them should run nox20:59
@clarkb:matrix.orgyou'll then be able to look at the log files for those jobs and check that the ensure-nox role ran through your proposed system and confirm it works that way in addition to the built in tests20:59
@aureliojargas:matrix.orgnice, thanks!21:00
@clarkb:matrix.orgAurelio Jargas: https://zuul.opendev.org/t/zuul/build/fb233076657a4f6194d23bbbf00c2464/log/job-output.txt#377-434 I think this is showing what I described21:04
@aureliojargas:matrix.orgIn job `zuul-build-python-release` the new role is also being called via `ensure-pyproject-build`: https://zuul.opendev.org/t/zuul/build/0d4a5280b6e5448fad506b39e65c4355/log/job-output.txt#571-60621:14
-@gerrit:opendev.org- Clark Boylan proposed:21:41
- [zuul/zuul] 945234: Try to use mariadb in unittest again https://review.opendev.org/c/zuul/zuul/+/945234
- [zuul/zuul] 945413: Make MySQL test fixture more robust https://review.opendev.org/c/zuul/zuul/+/945413
-@gerrit:opendev.org- Clark Boylan proposed:21:45
- [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253
- [zuul/zuul] 945387: DNM does using mysql unix socket work better than tcp https://review.opendev.org/c/zuul/zuul/+/945387
@clarkb:matrix.orgso I think the mysql fixture updates I'm making as part of testing this are valid for mysql or mariadb even if we don't switch. So I've gone ahead and pushed that inot a base change that we can land if all continues to look well21:48
@clarkb:matrix.orgthen we can keep rerunning tests against mariadb to see if that fixed it. The last run had failures but none from connection issues. but we may have gotten lucky so more passes are good21:48
@clarkb:matrix.orgI broke the linters. But the code is valid code os I'll let it run and report before pushing a fix21:50
@clarkb:matrix.orgok that stack is looking a lot happier now. Maybe it is flush privileges. That seems like something that hsouldnt' break us but a lot of jobs have succeeded and those that failed did so on other issues23:02
@clarkb:matrix.orgI'll wait for them all to report before pushing the linter fix so that we've got a record of that. Anyone still in contact with LinuxJedi I wonder if this is the sort of behavior mariadb would be interested in hearing about23:02
@jim:acmegating.comthanks Clark !23:03
@mordred:waterwanders.comClark: LinuxJedi is no longer with mariadb - he's at WolfSSL doing embedded crypto for microcontrollers23:25
@clarkb:matrix.orgah TIL23:28
@clarkb:matrix.orgin that case linuxjedi is unlikely to care too much23:28
@mordred:waterwanders.comthat said - I still know a few folks there. Would a summary of the issue be "On Mariadb, having multiple clients each running "flush privileges" as part of their business occasionally causes a new connection to fail with a host unauthorized error"23:29
@clarkb:matrix.orgthat seems to be what we're converging on23:30
@clarkb:matrix.orgthe use case here is test cases create a specific schema and user for use for that test and we can have up to ~6 test cases running at once. So occasionally it is possible a flush privileges overlaps with a new connection attempt23:30
@mordred:waterwanders.com"This is behavior we've seen in a test farm with parallel tests running flush privs as part of their test setup. We've only seen it on mariadb and not on mysql. Removing the flush privs (it wasn't needed anyway) has removed the intermittent auth errors"23:30
@clarkb:matrix.orgremoving flush privileges has made the error go away so far. It isn't deterministic so I can't say for sure yet that this is definitive23:31
@clarkb:matrix.orgthey may have other ideas23:31
@clarkb:matrix.orgoh and we're being rejected in early connection setup when the client requests server info before we attempt to authenticate23:31
@mordred:waterwanders.comyah - that's consistent with host-based acl error message23:32
@mordred:waterwanders.comit's saying "nobody from localhost is authorized" - so it doesn't even get to user/password stuff yet23:33
@clarkb:matrix.orgseen with mariadb 10.6 on ubuntu jammy and 10.11 on ubuntu noble. Ubuntu's mysql 8 package on jammy and noble has been fine23:34
@clarkb:matrix.orgya23:34
@clarkb:matrix.orgthat is all distro packaged mariadb and mysql to be clear23:34
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 944274: Only consider live items on dequeue event https://review.opendev.org/c/zuul/zuul/+/94427423:40
@mordred:waterwanders.comClark: I have passed on information to one of the Mariadb support engineers who loves weird bugs23:43
@mordred:waterwanders.comI thought briefly about submitting a bug - but that woudl require signing up for an account on a jira, and I don't care THAT much.  So I've done bug-report-as-linkedin-message - and I think my transformation to corporate drone should be considered complete23:44
@clarkb:matrix.orgha23:44
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 945418: Move zuul-nox-py311-multi-scheduler to the experimental pipeline https://review.opendev.org/c/zuul/zuul/+/94541823:48
-@gerrit:opendev.org- Clark Boylan proposed:23:48
- [zuul/zuul] 945413: Make MySQL test fixture more robust https://review.opendev.org/c/zuul/zuul/+/945413
- [zuul/zuul] 945234: Try to use mariadb in unittest again https://review.opendev.org/c/zuul/zuul/+/945234
- [zuul/zuul] 945253: DNM run lots of unittests to check mariadb instead of mysql https://review.opendev.org/c/zuul/zuul/+/945253
- [zuul/zuul] 945387: DNM does using mysql unix socket work better than tcp https://review.opendev.org/c/zuul/zuul/+/945387
@clarkb:matrix.orgok that should pass linting now and appears to be safe for mysql too. So I think we should land 945413 either way then look at landing 945234 if testing continues to look happy23:49
-@gerrit:opendev.org- Zuul merged on behalf of Felix Edel: [zuul/zuul] 945104: Fix promote event handling for github changes https://review.opendev.org/c/zuul/zuul/+/94510423:55
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 945419: Add logging of kubectl process group id to executor https://review.opendev.org/c/zuul/zuul/+/94541923:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!