*** tosky has quit IRC | 00:11 | |
ianw | The site uses SHA1 in signatures which is not allowed in the DEFAULT policy per Fedora 33 Change: | 00:44 |
---|---|---|
ianw | i.e. you now can't connect to the rax emergency console via firefox :/ | 00:45 |
ianw | i'm looking at the upgrade of the afs db servers, and it's probably a bettter idea to not do this over ssh | 00:45 |
*** LowKey has quit IRC | 01:28 | |
*** LowKey has joined #opendev | 01:28 | |
auristor | ianw: I've mentioned it before but will bring it up again. two afs database servers do not provides redundancy for reads but no redundancy for writes as only the server with the lowest ip address can be elected the coordinator. afsdb02.openstack.org in this case. | 02:43 |
auristor | can a third database server be added? perhaps co-locate with afs01.ord.openstack.org if bringing up another VM is not an option. | 02:45 |
ianw | auristor: i could add another. but if i take, say, afs02 down, doesn't afs01 become the lowest ip (i.e. the only)? | 02:48 |
auristor | no | 02:50 |
auristor | the ubik quorum is defined by the list of voting primary ip addresses as specified in the ubik service's CellServDB file. | 02:51 |
auristor | The server with the lowest ip address gets 1.5 votes and the others 1 vote. To win election requires greater than 50% of the votes. In a two server configuration there are a total of 2.5 votes to cast. 1.5 > 2.5/2 so afsdb02.openstack.org always wins regardless of what afsdb01.openstack.org says. And afsb01.openstack.org can never win because 1 < 2.5/2. | 02:53 |
auristor | by adding a third ubik server to the quorum, the total votes cast are 3.5 and it always requires the vote of two servers to elect a winner | 02:54 |
ianw | auristor: thanks for the detailed info. i'd assumed that "bos stop"-ing it would pull it, i didn't realise it was based on the config file. i can look at adding a new server | 02:58 |
auristor | if afsdb03 is added with the highest ip address, then either afsdb01 or afsdb02 can be elected | 02:58 |
ianw | for reference, it looks like the openssl on the RAX console servers is borked. https://github.com/openssl/openssl/issues/7126 | 02:59 |
ianw | openssl s_client -connect ord-novnc.servers.console.rackspacecloud.com:443 -cipher DEFAULT@SECLEVEL=1 works | 03:00 |
ianw | openssl s_client -connect ord-novnc.servers.console.rackspacecloud.com:443 -cipher DEFAULT@SECLEVEL=2 does not | 03:00 |
*** cloudnull has joined #opendev | 03:15 | |
*** ykarel has joined #opendev | 03:16 | |
ianw | everything points to the same issue as the github one. sslprobe shows it as an affected system. https://decoder.link/trace shows the remote site responding with "Signature Algorithm sha1+rsa (2+1)" | 03:24 |
*** akrpan-pure has joined #opendev | 03:34 | |
*** LowKey has quit IRC | 04:22 | |
*** akrpan-pure has quit IRC | 04:34 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add afsdb03 openstack.org https://review.opendev.org/c/opendev/system-config/+/777924 | 04:55 |
*** cloudnull has quit IRC | 04:58 | |
ianw | infra-root (corvus): ^ reviews appreciated but i believe we (I) want to babysit that by merging, deploying, adding dns SRV records, then restarting afsdb processes. i have added the server to dns but not the srv records ATM | 04:58 |
*** cloudnull has joined #opendev | 05:00 | |
*** akrpan-pure has joined #opendev | 05:20 | |
*** cloudnull has quit IRC | 05:31 | |
*** cloudnull has joined #opendev | 05:33 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add review01.opendev.org https://review.opendev.org/c/opendev/system-config/+/777925 | 05:36 |
*** akrpan-pure has quit IRC | 05:49 | |
*** marios has joined #opendev | 05:56 | |
*** cloudnull has quit IRC | 06:13 | |
*** cloudnull has joined #opendev | 06:14 | |
openstackgerrit | Ian Wienand proposed opendev/zone-opendev.org master: Update review01.opendev.org entries https://review.opendev.org/c/opendev/zone-opendev.org/+/777926 | 06:15 |
*** ralonsoh has joined #opendev | 06:32 | |
*** slaweq has joined #opendev | 07:08 | |
*** cloudnull has quit IRC | 07:10 | |
*** whoami-rajat has joined #opendev | 07:11 | |
*** cloudnull has joined #opendev | 07:15 | |
*** ykarel has quit IRC | 07:20 | |
*** sboyron has joined #opendev | 07:21 | |
*** ykarel has joined #opendev | 07:26 | |
*** eolivare has joined #opendev | 07:35 | |
*** brinzhang has joined #opendev | 07:37 | |
*** rpittau|afk is now known as rpittau | 08:04 | |
*** cloudnull has quit IRC | 08:10 | |
*** cloudnull has joined #opendev | 08:14 | |
*** tosky has joined #opendev | 08:20 | |
*** fressi has joined #opendev | 08:23 | |
*** hemanth_n has joined #opendev | 08:34 | |
*** gnuoy has quit IRC | 08:57 | |
*** zbr1 has joined #opendev | 08:58 | |
*** zbr has quit IRC | 09:01 | |
*** zbr1 is now known as zbr | 09:01 | |
*** zbr8 has joined #opendev | 09:06 | |
*** ykarel is now known as ykarel|lunch | 09:06 | |
*** zoharm has joined #opendev | 09:07 | |
*** zbr has quit IRC | 09:08 | |
*** zbr8 is now known as zbr | 09:08 | |
*** brinzhang_ has joined #opendev | 09:15 | |
*** brinzhang has quit IRC | 09:18 | |
*** ttx has quit IRC | 09:20 | |
*** ttx has joined #opendev | 09:21 | |
*** hashar has joined #opendev | 09:21 | |
*** zbr1 has joined #opendev | 09:27 | |
*** zbr1 has quit IRC | 09:27 | |
*** zbr has quit IRC | 09:29 | |
*** jpenag is now known as jpena | 09:42 | |
*** zbr has joined #opendev | 09:43 | |
*** ykarel|lunch is now known as ykarel | 09:54 | |
openstackgerrit | Jonathan Rosser proposed opendev/system-config master: Add Debian Bullseye to the reprepro config https://review.opendev.org/c/opendev/system-config/+/777968 | 10:05 |
*** calcmandan has quit IRC | 10:25 | |
*** calcmandan has joined #opendev | 10:25 | |
openstackgerrit | Jonathan Rosser proposed opendev/system-config master: Add Debian Bullseye to the reprepro config https://review.opendev.org/c/opendev/system-config/+/777968 | 10:27 |
*** fressi has left #opendev | 10:38 | |
*** fbo|off is now known as fbo | 10:52 | |
*** fressi has joined #opendev | 10:54 | |
*** toomer has joined #opendev | 11:22 | |
*** yoctozepto9 has joined #opendev | 12:17 | |
*** yoctozepto has quit IRC | 12:17 | |
*** yoctozepto9 is now known as yoctozepto | 12:17 | |
*** lpetrut has joined #opendev | 12:30 | |
*** jpena is now known as jpena|lunch | 12:32 | |
*** zbr has quit IRC | 12:34 | |
*** zbr has joined #opendev | 12:36 | |
*** hemanth_n has quit IRC | 12:44 | |
*** tbarron|out is now known as tbarron | 12:51 | |
*** mkowalski has quit IRC | 12:59 | |
*** mkowalski has joined #opendev | 13:00 | |
openstackgerrit | Sorin Sbârnea proposed openstack/project-config master: Add tripleo-ci-health-queries project https://review.opendev.org/c/openstack/project-config/+/777991 | 13:07 |
*** ykarel has quit IRC | 13:19 | |
*** ykarel has joined #opendev | 13:20 | |
*** jpena|lunch is now known as jpena | 13:24 | |
openstackgerrit | Sorin Sbârnea proposed openstack/project-config master: Add tripleo-ci-health-queries project https://review.opendev.org/c/openstack/project-config/+/777991 | 13:27 |
fungi | zbr: so i gather you have a repo you want to import which will use a different default branch than master. now that we have a gerrit version which can support it, we'll need to make sure that on import the default branch gets set to the correct name, and that replication to gitea respects it as well | 14:26 |
zbr | i am wondering if it worth the trouble, i could rename the branch on the source repo to avoid the extra effort | 14:27 |
zbr | somehow I have the impression that there lots of bugs related to using a default branch other than master. | 14:27 |
fungi | also since this looks like it's going to be an official tripleo deliverable, you probably want to be sure that team is okay with having one repo with a default branch named differently than its others | 14:28 |
zbr | and i am not sure yet if I can afford to pay the price to be the first | 14:28 |
*** gnuoy has joined #opendev | 14:29 | |
*** ralonsoh_ has joined #opendev | 14:30 | |
*** ralonsoh has quit IRC | 14:30 | |
fungi | we have a high level of confidence that zuul, gerrit and gitea will behave themselves regardless of the default branch name (though you may need to tell zuul what default branch to look for in that repo), it's mostly our project creation and import scripts which are likely to need some tweaks | 14:36 |
fungi | but yes, the easy workaround is to just rename that branch to master on the repo you want imported | 14:37 |
zbr | i already switched source to use master, for this case is better to keep consistency. | 14:37 |
*** ralonsoh_ is now known as ralonsoh | 14:37 | |
zbr | if it would have being a purely new project, not an utility repo, it would maybe worth the effort. | 14:38 |
zbr | i did hear something about git itself switching to main for new repos but I am not sure if that went out yet. | 14:38 |
fungi | yeah, i've been following their mailing list. there was a proposal in december, but so far nothing which has been accepted | 14:39 |
fungi | though there didn't seem to be a lot of resistance to the idea, so i expect it will happen at some point | 14:39 |
*** fressi has quit IRC | 14:40 | |
zbr | funny bit is that git itself is an offensive word (not the tool) | 14:40 |
fungi | zbr: unrelated (i think), but we have tests passing for everything i want to see in git-review 2.0.0 now, it's all under topic:gitreview-2 | 14:41 |
fungi | it's that one late entrant data loss fix and supporting test helpers, plus minor python packaging and docs cleanup | 14:42 |
zbr | i will have a look now. i was reading the updates and waiting for a ping. | 14:42 |
fungi | and the python 2.9 testing | 14:42 |
zbr | 2.9? i hope is a typo ;) | 14:42 |
fungi | hah, 3.9 yes ;) | 14:42 |
fungi | zbr: 777848 might be good to go in first, since that takes care of a lot of random job failures due to test timeouts | 14:44 |
fungi | otherwise there will probably be more rechecking to get the others to merge | 14:44 |
zbr | I already +W it and another one, one done i will get the others one by one. | 14:50 |
fungi | thanks! once things are merged i'll push a 2.0.0.0rc1 prerelease as a test, to make sure everything's working for release publication (like i did with bindep) | 15:01 |
*** Dmitrii-Sh has quit IRC | 15:10 | |
*** Dmitrii-Sh has joined #opendev | 15:17 | |
*** artom has joined #opendev | 15:20 | |
*** lpetrut has quit IRC | 15:29 | |
corvus | fungi: have we thought about granting all auth users the ability to edit hashtags? | 15:33 |
fungi | corvus: yes, i think some projects were just testing them out first before we looked at setting it in the central config | 15:35 |
corvus | fungi: looks like they're all -core though | 15:35 |
fungi | right, concern was raised by those teams that random users might unset hashtags on changes. i think that's probably no more of a concern than other sorts of vandalism already available to them though | 15:36 |
fungi | i think ironic was talking about extending it to all users on theirs once they were comfortable with how it was working for core reviewers | 15:36 |
corvus | yeah. it's *really* useful for all contributors to be able to help organize changes for review. it was invaluable for me as a non-core gerrit contributor during the gerrit hackathon. from my pov, if someone gets over-eager and adds the "critical-bugfix" tag to a change which isn't -- that's easy to gently correct. | 15:38 |
toomer | Hi fungi, We got another issue pushing the code to OpenDev Gerrit, this time for the openstack/releases repository. | 15:39 |
fungi | toomer: thanks, same error again? | 15:40 |
toomer | Would you have some time to investigate it together ? | 15:40 |
openstackgerrit | James E. Blair proposed openstack/project-config master: Allow all registered users to edit hashtags on all Zuul projects https://review.opendev.org/c/openstack/project-config/+/778012 | 15:40 |
toomer | Yes, let me paste the error | 15:40 |
fungi | i see it in the gerrit error log | 15:40 |
toomer | http://paste.openstack.org/show/803099/ | 15:41 |
fungi | looks like gibi was encountering it | 15:41 |
toomer | I have done git fsck - it looks good | 15:41 |
gibi | yepp it is me where the commit is originating from but it goes through gerrit.noridx.org (for legal reasons) | 15:41 |
toomer | I also see the missing tree: 98eedc4dae5087acbb38d5b2a0764393539ad098 | 15:41 |
gibi | hence toomer (who is admin there) investigating | 15:42 |
fungi | what does "goes through" mean? | 15:42 |
gibi | fungi: I push to gerrit.nordix.org and a jenkins job there pushes it forward to review.opendev.org | 15:42 |
fungi | oh, got it, so it's git run by some jenkins job which is encountering the error | 15:43 |
gibi | yes | 15:43 |
toomer | We have mirrors of OpenDev repositories which are then push by our Jenkins to OpenDev | 15:43 |
gibi | fungi: I think this is visible for the public https://jenkins.nordix.org/job/opendev-openstack-releases-push-upstream/152/console | 15:43 |
fungi | thanks, yep i can try to correlate https://jenkins.nordix.org/job/opendev-openstack-releases-push-upstream/ failures to the timestamps we see in the gerrit error log too | 15:45 |
fungi | any idea what version of git it's using? | 15:45 |
toomer | I'm on the slave server from where the push are made .... | 15:45 |
toomer | git version 2.17.1 | 15:45 |
toomer | Can you share the stack from server ? | 15:46 |
toomer | Maybe we will be able to figure something out | 15:46 |
fungi | sure, just a sec | 15:46 |
fungi | toomer: http://paste.openstack.org/show/803101 | 15:50 |
*** zoharm has quit IRC | 15:50 | |
fungi | and `git show 98eedc4dae5087acbb38d5b2a0764393539ad098` in that bare repo on the gerrit server's filesystem does have a tree in it | 15:51 |
toomer | yes | 15:51 |
toomer | http://paste.openstack.org/show/803099/ | 15:51 |
clarkb | my suggestion last week was that maybe it is git protocol v1 that new gerrit is struggling with and it would be worth trying v2 | 15:52 |
fungi | yep, i mean i ran git show on the gerrit server too, and it has that tree | 15:52 |
fungi | but confirming it's also present on the client is good | 15:53 |
fungi | thanks | 15:53 |
toomer | It's on the Gerrit server as well | 15:56 |
toomer | http://paste.openstack.org/show/803102/ | 15:56 |
toomer | Here are the last 3 commits on the OpenDev for the release repo | 15:58 |
toomer | 5318b88b7 Zuul Merge "Release versions for ansible-roles" | 15:58 |
toomer | 1fea88ecc Zuul Merge "ldappool 3.0.0" | 15:58 |
toomer | 1781bc796 Zuul Merge "Optional list of changes in commit message for auto release" | 15:58 |
toomer | Here are the last 3 commit for the gibi change | 15:58 |
fungi | cool, so basically at this point we have our gerrit claiming unreachability in the push, (perhaps erroneously) faulting that tree for missing, then eventually working fine at some later time | 15:58 |
toomer | e22050a76 Balazs Gibizer Wallaby Cycle Highlight for Nova | 15:58 |
toomer | 1fea88ecc Zuul Merge "ldappool 3.0.0" | 15:58 |
toomer | 1781bc796 Zuul Merge "Optional list of changes in commit message for auto release" | 15:58 |
gibi | fungi: yeah, this type off issue so far always resolved magically in less than 24 hours | 15:59 |
gibi | s/off/of | 15:59 |
fungi | i'm with clarkb that there's a good chance this is some incorrect optimization in the packfile getting pushed, and that using git protocol v2 might eliminate it. but debuggability for this is sorely lacking | 15:59 |
clarkb | fungi: it could also be that the game of telephone here is more susceptible to this issue | 16:01 |
clarkb | maybe gibi can try pushing directly ? | 16:01 |
toomer | That options coulg give us some more info on this | 16:01 |
toomer | GIT_CURL_VERBOSE=1 GIT_TRACE=1 | 16:01 |
fungi | yeah, wouldn't hurt to export those in your job at least temporarily and try to catch another failure | 16:02 |
gibi | clarkb: I can try that as a test but if that succeeds then we loose the ability to reproduce with this patch from nordix. And normally I have to push through nordix for legal reasons | 16:02 |
clarkb | gibi: right, I'm just pointing out we are going through potentially 4 different igt versions here to push code and its breaking. 99.99% of all other pushes involve two | 16:03 |
clarkb | (in this case its opendev gerrit jgit, nordix gerrit, your client, jenkins node client) | 16:03 |
gibi | clarkb: sure. I can unblock my work by pushing directly but that is just a one time solution | 16:04 |
clarkb | well, I'm less interested from an unblocking of work and more from a debugging standpoint. It would be useful info to know if eliminating the two intermdiate git versions functions more reliably | 16:05 |
clarkb | we also learn something if it doesn't work more reliably | 16:05 |
gibi | fungi, toomer: if you agree then I can try the direct push | 16:05 |
gibi | my local git client is | 16:05 |
gibi | $ git --version | 16:05 |
gibi | git version 2.30.1 | 16:05 |
gibi | clarkb: meanwhile I was able to push through other patches via nordix, e.g. in nova. | 16:06 |
toomer | I would prefer to spend some more time investigating this before we try direct push | 16:07 |
gibi | ack | 16:08 |
toomer | Here is a GIT trace | 16:08 |
toomer | http://paste.openstack.org/show/803103/ | 16:08 |
fungi | i'm in a meeting for the next ~50 minutes (and another after that) so i'm less available but will try to take a look momentarily | 16:11 |
clarkb | (I too have a meeting) | 16:12 |
toomer | I don't so I will keep looking on this | 16:17 |
*** ykarel is now known as ykarel|away | 16:28 | |
openstackgerrit | James E. Blair proposed openstack/project-config master: Allow all registered users to edit hashtags on all Zuul projects https://review.opendev.org/c/openstack/project-config/+/778012 | 16:31 |
*** ykarel|away has quit IRC | 16:33 | |
toomer | It looks like this type of problems are caused by optimization which causes git to send as little data as possible over the network for the uploaded change | 16:35 |
toomer | Base on the https://stackoverflow.com/questions/16586642/git-unpack-error-on-push-to-gerrit there is a workaround fir this issue | 16:36 |
clarkb | toomer: that is why I suspected the git protcol previously. Since that effects how git optimizes the pushes | 16:37 |
toomer | --[no-]thin | 16:37 |
toomer | These options are passed to git-send-pack(1). A thin transfer significantly reduces the amount of sent data when the sender and receiver | 16:37 |
toomer | share many of the same objects in common. The default is --thin. | 16:37 |
clarkb | toomer: I had thought that you may be using v2 which we enabled on the server side, but 2.17.1 is v1 only. It is possible that v2 optimizes this problem properly though and may be worth trying a v2 client | 16:38 |
toomer | Using --no-thin will increase the load on the OpenDev Gerrit | 16:40 |
clarkb | yes, if v2 works using that would be preferable | 16:40 |
clarkb | but you need git >=2.18 for support and 2.26 to use it by default | 16:40 |
toomer | clarkb: Do you know since which git version the v2 is supported ? | 16:40 |
toomer | Thanks | 16:41 |
toomer | git -c protocol.version=2 Will enforce the new version on git client >=2.18 | 16:44 |
clarkb | yup and 2.26 or newer should use v2 by default | 16:44 |
toomer | It looks like the --no-thin option helped, but I'm not 100% sure | 16:46 |
toomer | http://paste.openstack.org/show/803106/ | 16:46 |
toomer | Maybe the problem just went away as previously | 16:46 |
clarkb | ya hard to isolate when the problem mysteriously fixes itself | 16:47 |
fungi | right, i expect you'll have to keep a potential fix in place for a while and see if the problem comes back | 16:48 |
toomer | My plan is to use --no-thin option for time being and then upgrade the jenkins slaves to Ubuntu 20.4 and git 2.25.1 | 16:48 |
toomer | and use protocol v2 as suggested by clarkb | 16:49 |
fungi | toomer: good find! and excellent info, i see the same failure occasionally in our log for other users as well, so now we know what we can perhaps suggest to them as a workaround. thanks! | 16:49 |
fungi | infra-root: ^ something to keep in mind if any other users report "missing tree" errors when pushing changes | 16:50 |
*** rpittau is now known as rpittau|afk | 17:00 | |
*** marios is now known as marios|out | 17:07 | |
fungi | zbr: d'oh! my fault, 777799 is going to need to merge before any others, since it fixes a configuration error related to the switch to the opendev tenant | 17:15 |
*** marios|out has quit IRC | 17:16 | |
openstackgerrit | Merged openstack/project-config master: Allow all registered users to edit hashtags on all Zuul projects https://review.opendev.org/c/openstack/project-config/+/778012 | 17:23 |
*** eolivare has quit IRC | 17:38 | |
openstackgerrit | Merged opendev/git-review master: Update jobs for opendev tenant https://review.opendev.org/c/opendev/git-review/+/777799 | 17:41 |
*** mlavalle has joined #opendev | 17:58 | |
*** ralonsoh has quit IRC | 18:08 | |
*** jpena is now known as jpena|off | 18:10 | |
*** hashar has quit IRC | 18:10 | |
openstackgerrit | Merged opendev/git-review master: Increase test timeout to 5 minutes https://review.opendev.org/c/opendev/git-review/+/777848 | 18:28 |
*** sboyron has quit IRC | 18:46 | |
fungi | #status log filed spamhaus css removal for lists.katacontainers.io ipv6 address | 18:47 |
openstackstatus | fungi: finished logging | 18:47 |
fungi | #status log filed spamhaus pbl removal for lists.katacontainers.io ipv4 address | 18:47 |
openstackstatus | fungi: finished logging | 18:47 |
fungi | ianw: was there a theory as to why the gitea01 backups are still failing? i don't remember now | 18:54 |
clarkb | fungi: I theorized that maybe we had restarted the mariadb container around that time, which would have killed the connection doing the dump. I think that was rueld out though and haven't heard any new theories since | 18:57 |
fungi | yeah, it seems to happen daily | 19:09 |
fungi | based on e-mails coming to the shared root inbox | 19:10 |
*** stand has joined #opendev | 19:12 | |
*** hamalq has joined #opendev | 19:24 | |
openstackgerrit | Jeremy Stanley proposed opendev/git-review master: Remove comments for unstaged/uncommitted tests https://review.opendev.org/c/opendev/git-review/+/778056 | 19:30 |
*** toomer has quit IRC | 19:33 | |
openstackgerrit | Merged opendev/git-review master: Add test helpers for unstaged/uncommitted changes https://review.opendev.org/c/opendev/git-review/+/777687 | 19:36 |
*** stevebaker has joined #opendev | 20:01 | |
fungi | wow, we're already almost caught up on node requests for the day, and only topped out at a backlog of ~400. that's a pleasant change | 20:16 |
fungi | okay, digging into gitea01 backup failures, "Streaming script /etc/borg-streams/mysql failed!" even though the line immediately prior to that says "terminating with success status, rc 0" | 20:20 |
fungi | something doesn't add up | 20:20 |
fungi | ahh | 20:20 |
fungi | "mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `action` at row: 7064" | 20:21 |
fungi | i should have scrolled up farther | 20:21 |
fungi | it's a local mysqld though, so unlikely to be network connectivity at fault | 20:22 |
clarkb | unless the process is dying or something like that | 20:22 |
clarkb | may also be a local socket vs tcp thing? | 20:22 |
fungi | process start time for mysqld was Feb23 | 20:23 |
fungi | so running at least a week | 20:23 |
fungi | going to see if i can work out where the mysql container stores logs | 20:23 |
fungi | oh, says it's syslogging it | 20:25 |
fungi | also the dump didn't get oom-killed, last oom killer event logged was almost a month ago | 20:26 |
fungi | i don't actually see anything logged by mariadb in the syslog | 20:27 |
fungi | oho! here we go... it's in /var/log/containers/docker-mariadb.log.1 | 20:28 |
fungi | "Mar 1 05:54:50 gitea01 docker-mariadb[704]: 2021-03-01 5:54:50 192568 [Warning] Aborted connection 192568 to db: 'gitea' user: 'root' host: 'localhost' (Got an error writing communication packets)" | 20:28 |
fungi | not that it's all that helpful | 20:28 |
clarkb | fungi: host localhost may be the local unix socket not tcp | 20:29 |
clarkb | maybe we should switch it to 127.0.0.1 to use tcp | 20:29 |
fungi | https://cloudlinux.zendesk.com/hc/en-us/articles/360010985240-MySQL-5-7-and-Got-an-error-writing-communication-packets-message | 20:29 |
fungi | that suggests the query cache could be involved | 20:29 |
fungi | refers to https://bugs.mysql.com/bug.php?id=84639 | 20:30 |
clarkb | I mention the socket thing because it is in a container and the backup runs from the host iirc ? | 20:30 |
clarkb | but I could be wrong about that | 20:30 |
fungi | looking at that bug, maybe running mysqldump from within the container could help? | 20:31 |
clarkb | fungi: you shuold be able to tcp from the host or ya doing the dump within the container would also get you socket access | 20:34 |
fungi | doubtful it's actually the query cache bug, since mariadb docs say with 10.1.7 and later (we're using 10.4) you have to explicitly set a nonzero query_cache_size, and i can't find anywhere we're doing that | 20:37 |
fungi | i don't expect they would do that in the container image we're consuming either | 20:37 |
fungi | also leaves me wondering why we don't hit this with, say, the gerrit backups | 20:39 |
clarkb | gerrit is remote | 20:40 |
clarkb | so you have to use tcp | 20:40 |
clarkb | (this is why I suspect the socket :) ) | 20:40 |
*** mgagne has joined #opendev | 20:41 | |
ianw | fungi: yeah i had a quick look yesterday at gitea backups. it seems to isolated to failures to vexxhost | 20:47 |
ianw | backing-up to vexxhost i should say. which suggests to me networking; because it sometimes works | 20:48 |
clarkb | ianw: oh it works backing up to rax? | 20:48 |
fungi | could it be that the problem is not actually the local connection between the cronjob and the mariadb server, but the remote borg ssh socket dying and that's prematurely terminating the query stream? | 20:48 |
openstackgerrit | Mathieu Gagné proposed openstack/project-config master: Enable inap-mtl01 region https://review.opendev.org/c/openstack/project-config/+/778064 | 20:50 |
ianw | the error points are suspiciously similar http://paste.openstack.org/show/BR1pVA2GsVGOAZwCk3xJ/ | 20:50 |
clarkb | mgagne: re ^ I take it that implies the leaky IP address problem is expected to be happier now? | 20:51 |
ianw | three days in a row it failed at "Lost connection to MySQL server during query when dumping table `action` at row: 6968" | 20:51 |
clarkb | mgagne: also I want to say the old limit was ~159 not 200. I'm good with 200 if you are :) | 20:51 |
mgagne | that's what we have, assuming it's still 8 VCPUs per VM | 20:51 |
mgagne | Our version does not support the option suggested by melwitt, but all compute nodes have been emptied from the previous instances so no leftover here. | 20:52 |
mgagne | old limit was ~190 IIRC | 20:52 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add tools being used to make sense of gerrit account inconsistencies https://review.opendev.org/c/opendev/system-config/+/777846 | 20:53 |
clarkb | fungi: ^ that tries to address your comments. I've run it as an unprivileged user. The next step is to run it as an admin which should get rid of the "this email has only one account" issues and then see what our dataset looks like | 20:54 |
clarkb | fungi: if you get a chance to review that before we run it as adming that would be excellent | 20:54 |
clarkb | mgagne: oh sorry looks like the old value we had for inap max-servers was 95. I'm happy to try at 200 and take it from there though | 20:55 |
fungi | sure thnig | 20:55 |
clarkb | mgagne: I'll go ahead and approve the change, thank you | 20:55 |
fungi | yes, huge thanks! this will help tremendously | 20:55 |
*** toomer has joined #opendev | 20:55 | |
mgagne | let me know if you see any issue so we can address them right away | 20:56 |
fungi | absolutely | 20:56 |
ianw | clarkb: any particular thoughts on adding afsdb03 @ https://review.opendev.org/c/opendev/system-config/+/777924 ? it seems like a decent way to get us testing focal | 20:56 |
auristor | ianw: how do the DNS SRV records get updated? | 20:58 |
clarkb | ianw: that sounds reasonable to me. Should I approve it? | 20:59 |
clarkb | auristor: that zone is hosted on a dnsaas system and we typically add ercords to it by hand | 20:59 |
auristor | ack | 20:59 |
ianw | auristor: for openstack.org it's a manual process via a RAX hosted option | 20:59 |
clarkb | ianw: I +2'd it, will let you approve when ready | 21:00 |
auristor | I've +1 | 21:00 |
*** toomer has quit IRC | 21:00 | |
mgagne | IIRC, we might have a thundering herd issue if Nodepool tries to create all of them at the same time, lets see | 21:00 |
ianw | clarkb / auristor : thanks ... i'll do it after school run here; deploy the change, update the SRV records and restart things in quick succession | 21:01 |
*** auristor has quit IRC | 21:02 | |
fungi | mgagne: if you do, it'll probably be the hypervisor hosts all trying to warm their image caches | 21:06 |
fungi | that can overload storage networks rather quickly | 21:06 |
mgagne | yes, that's what I suspect will happen | 21:06 |
mgagne | mainly that Nova and/or Nodepool will give him and instance will go in ERROR state and get deleted/recreated for a while. | 21:07 |
fungi | mgagne: though you may be in luck, we just caught up on our node request backlog in the past few minutes so the count of nodes in use is dropping: https://grafana.opendev.org/d/5Imot6EMk/zuul-status?orgId=1 | 21:07 |
fungi | that may naturally lead to a more gradual utilization | 21:08 |
openstackgerrit | Merged openstack/project-config master: Enable inap-mtl01 region https://review.opendev.org/c/openstack/project-config/+/778064 | 21:12 |
*** auristor has joined #opendev | 21:16 | |
*** slittle1 has joined #opendev | 21:20 | |
*** LowKey has joined #opendev | 21:28 | |
*** toomer has joined #opendev | 21:44 | |
*** sboyron has joined #opendev | 21:53 | |
*** toomer has quit IRC | 22:04 | |
*** whoami-rajat has quit IRC | 22:07 | |
openstackgerrit | Kevin Carter proposed zuul/zuul-jobs master: Make .sh browsable on swift logs https://review.opendev.org/c/zuul/zuul-jobs/+/731795 | 22:22 |
*** sboyron has quit IRC | 22:25 | |
fungi | mgagne: looks like we started building in there a little over an hour ago, some nodes took a few minutes to come active but no errors that i see, we peaked around 60 nodes a few minutes ago: https://grafana.opendev.org/d/tazoteEGz/nodepool-inap?orgId=1 | 22:34 |
fungi | so far so good! | 22:34 |
mgagne | glad to hear it | 22:34 |
kopecmartin | ianw: hi, could you give me access to refstack.openstack.org (similarly like i have one to reftstack01.openstack.org), I wanna compare a few configs as a last resort | 22:37 |
fungi | kopecmartin: yep, i can copy your public key from reftstack01, just a moment | 22:38 |
kopecmartin | fungi: thanks .. no rush, i need to go to sleep anyway now | 22:38 |
ianw | kopecmartin: ok, is there like a one line summary of where it's at? | 22:39 |
fungi | kopecmartin: what account are you logging into on reftstack01.openstack.org? | 22:39 |
fungi | is it the "ubuntu" account maybe? | 22:39 |
kopecmartin | fungi: kopecmartin account | 22:40 |
* fungi looks again | 22:40 | |
kopecmartin | fungi: 104.239.166.15 | 22:40 |
kopecmartin | that's where refstack01 is | 22:40 |
fungi | kopecmartin: are you sure you're not logging into a held job node you're calling reftstack01.openstack.org? | 22:40 |
kopecmartin | fungi: oh, yeah, it's a held node | 22:41 |
fungi | actual refstack01 in dns is 104.239.144.250 | 22:41 |
ianw | yeah that's the held node from https://etherpad.opendev.org/p/refstack-docker | 22:41 |
ianw | i don't see why we can't give you access to the old server if required, but we might want to do it via cfg mgmt | 22:43 |
fungi | i'm personally okay wit the idea of kopecmartin having ssh access to the production server for troubleshooting purposes, but we should add the account the way we... | 22:43 |
fungi | yeah that | 22:43 |
fungi | and get an acknowledgement of the server ssh access policy | 22:44 |
ianw | kopecmartin: just grep "extra_users" in system config and follow along if that's what you'd like to do; of course happy to grab anything required too | 22:46 |
fungi | clarkb: when you get back, see comments on 777846, i expect it's not doing quite what you'll want there | 22:52 |
openstackgerrit | James E. Blair proposed ttygroup/gertty master: Add support for searching for hashtags https://review.opendev.org/c/ttygroup/gertty/+/778088 | 22:57 |
openstackgerrit | Martin Kopec proposed opendev/system-config master: SSH access to refstack for kopecmartin https://review.opendev.org/c/opendev/system-config/+/778090 | 23:04 |
kopecmartin | ianw: fungi is that ok^^ | 23:05 |
ianw | kopecmartin: that won't deploy on the old server, which is what you really want right? | 23:06 |
kopecmartin | ianw: yes | 23:06 |
ianw | yeah, have to look but refstack-docker.yaml is the var file for the new server | 23:06 |
ianw | kopecmartin: also good just to confirm in the changelog you've seen https://docs.opendev.org/opendev/system-config/latest/sysadmin.html#ssh-access | 23:07 |
*** slaweq has quit IRC | 23:07 | |
clarkb | fungi: yes, I mentioned this last week. Its a new more recently than. It shouldn't accidentally identify accounts that were used recently as non recent, but may identify non recent accounts as recently used | 23:08 |
clarkb | fungi: I was thinking we'd start here for now to whittle the list down, then get smarter with the smaller list | 23:08 |
fungi | ahh, yeah got it. if we wanted it to be more accurate, for loop over the returned results (maybe also with pagination handling), then iterate through the patchsets and comments for each result | 23:09 |
clarkb | yup, I think that may be annoyingly slow if we try to do it for the full set now | 23:09 |
clarkb | but if we cut the list in half or something like that then doing that to further reduce the list down makes sense to me | 23:10 |
ianw | kopecmartin: i guess the "refstack" group doesn't actually match the old server, weirdly | 23:10 |
clarkb | fungi: do you agree it should "fail" in the safe manner I assert? and if so any objections to me running that as an admin nowish? | 23:12 |
fungi | yeah, it will potentially consider stale accounts as recently used, but that's safe | 23:12 |
kopecmartin | ianw: ah, let's forget that, I don't need it, I've found a workaround which will solve the problem (ProxyPass in the vhost config) with the missing api part of URLs, i'll propose a patch tomorrow | 23:13 |
fungi | clarkb: also it's not all that slow. i mean the openstack election tooling looks up recent commit activity for a single account in a few seconds | 23:13 |
fungi | and generates entirely electoral rolls of >1k accounts based on a year of change activity in maybe half an hour | 23:14 |
clarkb | fungi: I may be missing something, but I don't think there is a way to ask for the most recent comment directly? so you have to lookup all the recent changes and scan them? | 23:14 |
clarkb | the current script takes about 10 minutes to run just being simple | 23:14 |
ianw | kopecmartin: ok, feel free to follow-up and we can make sure it installs your user on the old and new hosts if you want | 23:15 |
fungi | clarkb: yep, that's basically what i've done elsewhere. for loop over the changes returned, tell it to include all patchsets and comments, then for loop over the patchsets and comments | 23:15 |
fungi | but as you say, no need to complicate it for now | 23:15 |
clarkb | alright I'll go ahead and run it as admin in a minute | 23:18 |
openstackgerrit | Merged openstack/project-config master: intel-nfv-ci-tests: move zuul definition into repo https://review.opendev.org/c/openstack/project-config/+/777511 | 23:24 |
clarkb | I should've given this a progress bar | 23:35 |
fungi | another thing i've found after writing my third or fourth gerrit data scraper... the trick is to request as much data as you can in the fewest number of queries possible, and then do the rest with client-side filtering/iteration | 23:38 |
fungi | individual requests to the gerrit api incur a ton of overhead | 23:38 |
clarkb | ya in this case I think the number of requests is minimal. It gets the active then inactive accounts. THen checks for ownership and reviews done by them | 23:39 |
clarkb | I guess I could query more accounts at once | 23:40 |
fungi | not easily with the way you're doing it now though | 23:40 |
fungi | since you're taking advantage of assuming the first returned result is for the one account you're asking about | 23:40 |
clarkb | hrm ya I think you'd have to lookup account details in all cases for each of the erturns to get that info anyway | 23:41 |
clarkb | anyway its done running | 23:42 |
clarkb | I'll put the results on review so others can check them too | 23:42 |
clarkb | infra-root review:~clarkb/gerrit_user_cleanups/external_id_conflict_classifications has the results of that script run | 23:46 |
ianw | 13901,13902,13903,13905,13906 ... that's a series | 23:49 |
openstackgerrit | Merged opendev/system-config master: Add afsdb03 openstack.org https://review.opendev.org/c/opendev/system-config/+/777924 | 23:53 |
clarkb | ianw: ya there are definitely a few in here where you could tell peopel were struggling | 23:55 |
fungi | some people will clearly try a lot of different ways to get their account to work (or maybe the same way over and over) | 23:56 |
*** gibi has quit IRC | 23:58 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!