ianw | fungi: did you modify gerritlib/gerrit.py around the _read() function on the eavesdrop host? | 00:02 |
---|---|---|
fungi | yes | 00:03 |
ianw | adding the "if" | 00:03 |
fungi | i applied https://review.opendev.org/763658 but once that was in it was filling the logs with empty strings so i added the "if l:" around the _read() to silence those | 00:04 |
fungi | i can push up a new patchset of 763658 with that worked in | 00:04 |
ianw | ahh, ok | 00:04 |
ianw | i have the ssh close/retry working, i was just trying to piece together how the code on the system came to be :) | 00:05 |
ianw | maybe just leave it if you like, and i'll put up something we can compare to | 00:06 |
fungi | i've updated 763658 with that now | 00:07 |
fungi | you can also feel free to hack on that change if it helps | 00:07 |
ianw | the problem is it never breaks out of the read. it will be easier to explain in diff format :) | 00:09 |
fungi | yeah | 00:10 |
fungi | i guess we need to return if it's empty? | 00:10 |
fungi | that's as far as i'd gotten before i was too out of it to continue yesterday | 00:11 |
*** tosky has quit IRC | 00:17 | |
ianw | fungi: pretty much, just beating it into shape, but i can close the connection from gerrit and have it re-start now | 00:25 |
fungi | ooh nice! | 00:28 |
*** brinzhang has joined #opendev | 00:38 | |
*** openstackgerrit has joined #opendev | 00:39 | |
openstackgerrit | Ian Wienand proposed opendev/gerritlib master: Handle empty reads as closed connections https://review.opendev.org/c/opendev/gerritlib/+/763892 | 00:39 |
ianw | fungi: ^ ... not clear to me if it didn't work before, or something else changed | 00:41 |
*** ralonsoh has quit IRC | 00:46 | |
*** ralonsoh has joined #opendev | 00:46 | |
fungi | ianw: thanks! entirely possible this is a behavior change in newer mina-sshd or something | 00:55 |
openstackgerrit | Ian Wienand proposed opendev/gerritlib master: Handle empty reads as closed connections https://review.opendev.org/c/opendev/gerritlib/+/763892 | 00:59 |
*** mlavalle has quit IRC | 01:26 | |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: Pin keystoneauth1 and cachetools on older Python https://review.opendev.org/c/zuul/zuul-jobs/+/763866 | 01:29 |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: Use Python 3.x with launchpadlib https://review.opendev.org/c/zuul/zuul-jobs/+/763834 | 01:29 |
*** ysandeep|away is now known as ysandeep | 01:44 | |
*** brinzhang has quit IRC | 02:16 | |
*** iurygregory has quit IRC | 02:25 | |
*** ykarel has joined #opendev | 02:26 | |
*** whoami-rajat__ has quit IRC | 02:31 | |
*** ykarel has quit IRC | 03:19 | |
*** auristor has quit IRC | 03:49 | |
*** auristor has joined #opendev | 03:56 | |
*** raukadah is now known as chandankumar | 04:09 | |
*** amaron has quit IRC | 04:38 | |
*** ykarel has joined #opendev | 04:48 | |
*** ykarel_ has joined #opendev | 04:53 | |
*** ykarel has quit IRC | 04:53 | |
*** ykarel_ is now known as ykarel | 04:54 | |
*** ysandeep is now known as ysandeep|afk | 05:22 | |
openstackgerrit | Ian Wienand proposed opendev/gerritbot master: Build against gerritlib master branch https://review.opendev.org/c/opendev/gerritbot/+/763927 | 05:50 |
*** ykarel has quit IRC | 05:55 | |
ianw | fungi: ^ i think we can incorporate changes with that, and test the container? | 05:57 |
*** marios has joined #opendev | 06:04 | |
*** ykarel has joined #opendev | 06:08 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/763930 | 06:12 |
*** ralonsoh has quit IRC | 06:46 | |
*** ysandeep|afk is now known as ysandeep | 06:57 | |
*** iurygregory has joined #opendev | 07:02 | |
*** marios has quit IRC | 07:11 | |
*** ykarel_ has joined #opendev | 07:22 | |
*** ykarel has quit IRC | 07:25 | |
*** marios has joined #opendev | 07:29 | |
*** DSpider has joined #opendev | 07:36 | |
*** rpittau|afk is now known as rpittau | 07:39 | |
*** ykarel__ has joined #opendev | 07:45 | |
*** ykarel_ has quit IRC | 07:48 | |
*** lpetrut has joined #opendev | 07:53 | |
*** sboyron__ has joined #opendev | 07:58 | |
*** slaweq has joined #opendev | 08:02 | |
*** eolivare has joined #opendev | 08:02 | |
*** ykarel__ is now known as ykarel | 08:05 | |
*** hrw has joined #opendev | 08:09 | |
hrw | morning | 08:09 |
hrw | Did we lost 'cherrypick' button with Gerrit upgrade? | 08:09 |
hrw | ah. it is in menu now... | 08:10 |
*** sboyron__ is now known as sboyron | 08:12 | |
*** ralonsoh has joined #opendev | 08:15 | |
*** DSpider has quit IRC | 08:31 | |
*** DSpider has joined #opendev | 08:32 | |
*** andrewbonney has joined #opendev | 08:42 | |
*** mgoddard has joined #opendev | 08:44 | |
*** tosky has joined #opendev | 08:48 | |
*** dtantsur|afk is now known as dtantsur | 09:01 | |
openstackgerrit | Sorin Sbârnea proposed zuul/zuul-jobs master: Add ensure-ansible role https://review.opendev.org/c/zuul/zuul-jobs/+/749706 | 09:04 |
*** hrw has left #opendev | 09:17 | |
*** marios has quit IRC | 09:30 | |
*** iurygregory has quit IRC | 09:37 | |
*** marios has joined #opendev | 09:39 | |
*** iurygregory has joined #opendev | 09:43 | |
*** mgoddard has quit IRC | 10:11 | |
*** fressi has quit IRC | 10:18 | |
sshnaidm | I wonder if there is a way to mark failed jobs by red as it was before? | 10:20 |
sshnaidm | And if possible to see list of jobs in the main page, instead of looking for them in comments.. | 10:21 |
zbr | sshnaidm: you will have to wait a while, likely few weeks. | 10:21 |
zbr | the feature needs to be reimplemented and there are other stuff that are more important. | 10:22 |
zbr | sshnaidm: but if you are feeling confident in your JS skills, you could try to update the gresemonkey stripts that do add these features and we can start using them without having to update gerrit itself. See https://opendev.org/x/coats | 10:23 |
*** mgoddard has joined #opendev | 10:24 | |
zbr | my guess is that we only need to find the new HTML elements where to inject the table | 10:25 |
*** mgoddard has quit IRC | 10:35 | |
*** fressi has joined #opendev | 10:48 | |
*** mgoddard has joined #opendev | 10:49 | |
*** ralonsoh has quit IRC | 11:20 | |
*** ralonsoh has joined #opendev | 11:23 | |
*** fressi has quit IRC | 11:32 | |
*** fressi has joined #opendev | 11:33 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Fix "git-review -d" erases work directory if on the same branch as the change downloaded https://review.opendev.org/c/opendev/git-review/+/399779 | 11:50 |
*** ysandeep is now known as ysandeep|brb | 12:00 | |
*** hashar has joined #opendev | 12:00 | |
*** eolivare_ has joined #opendev | 12:09 | |
*** eolivare has quit IRC | 12:12 | |
*** dtantsur is now known as dtantsur|brb | 12:22 | |
*** ykarel is now known as ykarel|away | 12:23 | |
*** ysandeep|brb is now known as ysandeep | 12:25 | |
*** fressi has quit IRC | 12:25 | |
*** slaweq has quit IRC | 12:29 | |
*** slaweq has joined #opendev | 12:31 | |
*** eolivare_ has quit IRC | 12:32 | |
*** ykarel|away has quit IRC | 12:44 | |
*** fressi has joined #opendev | 12:44 | |
*** eolivare_ has joined #opendev | 12:50 | |
*** fressi has quit IRC | 12:59 | |
*** fressi has joined #opendev | 13:01 | |
openstackgerrit | wes hayutin proposed openstack/project-config master: add review-priority for tripleo-ci https://review.opendev.org/c/openstack/project-config/+/715069 | 13:08 |
frickler | infra-root: did we change something in gerrit some time after the upgrade that would cause me to no longer receive mails about zuul results? I did get some after the upgrade, but now I notice some responses missing. I do still receive proper mails about comments and new patchsets | 13:15 |
zbr | frickler: email template changed, you need to update your filters. | 13:28 |
*** lpetrut has quit IRC | 13:36 | |
*** lpetrut has joined #opendev | 13:38 | |
*** dtantsur|brb is now known as dtantsur | 13:41 | |
*** whoami-rajat__ has joined #opendev | 13:44 | |
frickler | zbr: I filter on "From: review@openstack.org", don't think that has changed | 13:45 |
*** mgoddard has quit IRC | 13:46 | |
*** mgoddard has joined #opendev | 13:53 | |
*** pabelanger has left #opendev | 13:57 | |
zbr | frickler: i still receive my emails, so i doubt is working only for me. but there is a section mentioning changes related to how emails are sent. also now the default is to send HTML unless you change your user config to force plain. | 14:00 |
openstackgerrit | Tristan Cacqueray proposed opendev/system-config master: gerrit: install zuul-results plugin to display the build table https://review.opendev.org/c/opendev/system-config/+/763891 | 14:10 |
tristanC | infra-root: i've tested that zuul-results plugin in a test setup using docker.io/opendevorg/gerrit:3.2 , which seems to work, but it would be good to get it running on review-dev first. Let me know if i can help with that. | 14:21 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/c/zuul/zuul-jobs/+/740935 | 14:22 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/c/zuul/zuul-jobs/+/740935 | 14:25 |
fungi | frickler: i don't believe we changed anything about e-mails after the upgrade (not yet anyway) | 14:35 |
*** gouthamr_ has quit IRC | 14:36 | |
openstackgerrit | Sorin Sbârnea proposed openstack/project-config master: add review-priority for tripleo-ci https://review.opendev.org/c/openstack/project-config/+/715069 | 14:39 |
frickler | fungi: any idea how to view gerrit logs regarding how it sends mails? the mails that I see in the exim log seem to get delivered just fine | 14:42 |
openstackgerrit | Sorin Sbârnea proposed opendev/elastic-recheck master: WIP: Run elastic-recheck container https://review.opendev.org/c/opendev/elastic-recheck/+/729623 | 14:50 |
openstackgerrit | Thierry Carrez proposed openstack/project-config master: Add release jobs to oslo.metrics https://review.opendev.org/c/openstack/project-config/+/763986 | 14:53 |
fungi | frickler: we can add a mailrouter in exim which delivers copies of everything to a local mbox file | 14:56 |
fungi | or are you not looking for copies of the messages? | 14:57 |
openstackgerrit | Merged zuul/zuul-jobs master: Pin keystoneauth1 and cachetools on older Python https://review.opendev.org/c/zuul/zuul-jobs/+/763866 | 14:57 |
fungi | frickler: if you're looking for mail sending errors from gerrit, looks like those end up in its error_log, look for SendEmail | 14:58 |
*** fressi has quit IRC | 14:59 | |
*** roman_g has joined #opendev | 15:00 | |
frickler | fungi: I was looking for logs for successfully sent mails, but indeed the error log has a lot of "java.lang.NullPointerException: Null email" | 15:06 |
frickler | does the zuul accout not have an email address assigned? | 15:06 |
frickler | maybe this is related to the fact that gerrit now tries to include every reviewer in the reply-to for a msg | 15:06 |
frickler | seems there's also a huge number of errors related to stackalytics-bot-2 | 15:09 |
*** ysandeep is now known as ysandeep|away | 15:17 | |
fungi | yeah, saw those too | 15:23 |
frickler | I only find these "Null email" msgs in error_logs after the upgrade. also except zuul there's also fungi's account without an email at least according to the mouseover in the UI | 15:23 |
frickler | not sure why some mails still get sent at all, then | 15:23 |
fungi | the fungi.admin account? yeah, that may be yet another reason to assign e-mail addresses on accounts like that | 15:24 |
frickler | fungi: actually I think that's the normal account, it just says "fungi", e.g. on https://review.opendev.org/c/zuul/zuul/+/763333/ | 15:25 |
frickler | and the status says "missing, presumed fed" | 15:25 |
fungi | oh, that's where i updated my display name field, that account still has an e-mail address set | 15:26 |
fungi | if you look in preferences there are now separate full name and display name fields | 15:26 |
fungi | i left my full name as Jeremy Stanley but set my display name to fungi (the field was previously blank) | 15:27 |
fungi | i haven't changed the e-mail address on that account | 15:27 |
frickler | fungi: yeah, but the mouseover doesn't show me your email address for some reason, it does do so for the other reviewers except zuul | 15:27 |
fungi | i bet showing the e-mail address is its fallback when display name is null | 15:29 |
fungi | likely almost nobody has set a display name yet | 15:29 |
frickler | fungi: hmm, if I set my display name, the mouseover still shows both the full name and the email for myself | 15:31 |
fungi | interesting | 15:31 |
frickler | anyway, /me needs a break. do we have a meeting about the gerrit update later or no meeting and just tackle issues? | 15:33 |
fungi | i assume we'll have at least a brief meeting at 19:00 for anyone around to recap, though i haven't seen a meeting agenda | 15:34 |
frickler | clarkb mentioned something about not doing an agenda yesterday, but I wasn't sure whether that would mean no meeting, too | 15:34 |
frickler | bbl | 15:35 |
fungi | ahh, yeah, i'm good either way | 15:35 |
*** lpetrut has quit IRC | 15:35 | |
openstackgerrit | Jay Faulkner proposed ttygroup/gertty master: Fix auth-type in opendev example config https://review.opendev.org/c/ttygroup/gertty/+/763890 | 15:36 |
clarkb | frickler: fungi I meant no agenda so that we could recap the upgrade during our meeting | 15:43 |
clarkb | so still meeting but more low key | 15:43 |
fungi | wfm | 15:44 |
zbr | clarkb: i guess that the high load after upgrade explains the long push times (20-40s) and random delays in page load times, 5-15s, sometimes even timing out. | 15:51 |
clarkb | that is my hunch | 15:52 |
clarkb | previous to the upgrade we saw this from people scanning the gerrit api for change history | 15:52 |
clarkb | so this isn't necessarily an upgrade specific issue, but it likely does need further debugging | 15:52 |
zbr | what is not clear to me yet is this is the rush after the break or if we really have a notable regression in performance. | 15:52 |
clarkb | the next step is likely for gerrit admins to check the java melody dashbaord | 15:54 |
clarkb | as that gives us a bit more info on what internally is consuming resources and taking time | 15:55 |
zbr | that graph does not paint a picture regarding number of pages served or average page serve time. | 15:55 |
zbr | i will not be surprised if admins find that performance is caused by a bunch of bots and not humans | 15:56 |
mwhahaha | my gertty is getting read timeouts on a fairly frequent basis (likely related to the load stuff) | 15:56 |
clarkb | zbr: correct its system level metrics. java melody gives us finer detail, but unfortunately cannot be exposed publicly because allows you to kill threads | 15:56 |
zbr | melody is quite nice and useful | 15:56 |
fungi | ...and dangerous | 15:56 |
* zbr could say the same about alcohol,... hmm, that gave him an idea. | 15:57 | |
fungi | heh | 15:57 |
clarkb | historically the things we've seen cause problems are GC'ing being slow (I don't think this is a likely cause here because memory use is low) and people running expensive queries often | 16:00 |
clarkb | melody should give us insight to both of those or at least further hints. I have meetings and then will see if I have time to look closer | 16:00 |
openstackgerrit | Sorin Sbârnea proposed openstack/project-config master: add review-priority for tripleo-ci https://review.opendev.org/c/openstack/project-config/+/715069 | 16:03 |
*** rosmaita has joined #opendev | 16:04 | |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: DNM - testing registry jobs https://review.opendev.org/c/zuul/zuul-jobs/+/764006 | 16:09 |
clarkb | there are multiple stackalytics accounts producing errors in the gerrit error_log due to Stream is already closed connection reset by peer | 16:09 |
clarkb | I wonder if gerrit may continue to process those queries after the connection closed and we pile them up (somethign to look at in melody) | 16:09 |
*** hashar is now known as hasharAway | 16:11 | |
clarkb | fungi: I guess the process now is to add our normal account to admins temporarily to view the java melody stuff? /me works on that now | 16:12 |
fungi | oh, yep that's probably necessary unless we want to create a separate acl/group for melody | 16:12 |
*** mlavalle has joined #opendev | 16:13 | |
clarkb | garbage collection looks fine. Active threads is "high" | 16:14 |
clarkb | relative to what it was like yesterday | 16:15 |
clarkb | memory looks stable too | 16:15 |
clarkb | sorting threads by cpu time my suspicions about ssh queries seem valid | 16:17 |
clarkb | but it is also some zuul's as well I think. Notably our zuul does not show up there. I think because we use http for those queries | 16:18 |
clarkb | fungi: can you add yourself to admins as well and take a look? make sure I'm not missing anything | 16:19 |
clarkb | ah except I think these threads are reused | 16:20 |
clarkb | so specific queries aren't necessarily bad, more that we're spending a lot of time in ssh | 16:20 |
clarkb | ya refreshing the list we see that the threads remain but the queries change or go away | 16:21 |
clarkb | I also notice that send email comments is another cpu consumer and gerrit show-queue shows we may only have one thread sending email | 16:22 |
fungi | the error log is spammed by errors related to ssh sessions for stackalytics | 16:22 |
fungi | wondering if it could be related | 16:22 |
clarkb | fungi: ya, I think they may be contributing to the cpu time build up as they query over and over and fail | 16:22 |
clarkb | I think we should try increasing sendemail.threadPoolSize from its default value of 1 as well | 16:22 |
fungi | not a terrible idea if there's a backlog | 16:23 |
clarkb | fungi: ya checkout gerrit show-queue | 16:23 |
fungi | oh indeed | 16:23 |
fungi | could also be related to the bug you discovered with watchers of all-projects getting spammed | 16:24 |
clarkb | ya | 16:24 |
clarkb | I'm thinking lets start by increasing that threadPoolSize value and see if things end up being happier | 16:24 |
clarkb | separately I'll go through the list I generated of people with those watches and maybe we start disabling them? | 16:24 |
clarkb | actually lets do one thing at a time | 16:24 |
clarkb | to have a clearer picture of what is going on. threadPoolSize is easy so lets start there | 16:25 |
clarkb | fungi: for sendemail do you think ~4 is enough or maybe should be 8 ? | 16:25 |
fungi | seeing if i can make an educated guess based on queue size and times | 16:26 |
fungi | looks like it's nearly half an hour behind and has over 100 tasks waiting | 16:27 |
*** roman_g has quit IRC | 16:27 | |
fungi | spends a while (15 seconds maybe?) on each of those | 16:29 |
fungi | some seems to go faster than others, presumably determined by how many recipients there are | 16:29 |
clarkb | seems likely | 16:29 |
clarkb | oh you know what we can do is set that other flag I foudn | 16:30 |
clarkb | but may do that after threads | 16:30 |
fungi | the backlog is growing rather than shrinking, so i think on average it's taking longer to process each one than the frequency at which new entries are being enqueued | 16:31 |
fungi | fairly slowly though | 16:31 |
clarkb | change.sendNewPatchsetEmails | 16:31 |
fungi | so yeah a small thread count increase is likely sufficient for it to catch back up | 16:32 |
fungi | 4 seems like plenty | 16:32 |
clarkb | cool. Lets start there, if we continue to backlog lets look at change.sendNewPatchsetEmails next (set that to false) then if still sad we look at cleaning up project watches? | 16:32 |
clarkb | change incoming | 16:32 |
*** chandankumar is now known as raukadah | 16:32 | |
fungi | i can imagine some folks like to get e-mail notification of new patchsets for existing changes | 16:32 |
sshnaidm | fungi, clarkb do you know if main page in new gerrit UI is customizable? I wonder if it's possible to see jobs results in main page without looking for them in comments. | 16:33 |
fungi | sshnaidm: it is customizable by creating plugins | 16:33 |
fungi | gerrit has some excelent developer docks about writing ui plugins | 16:34 |
fungi | er, docs | 16:34 |
sshnaidm | fungi, java? | 16:34 |
clarkb | sshnaidm: java and javascript | 16:34 |
clarkb | sshnaidm: tristanC has started work on that | 16:34 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Increase gerrit sendemail thread pool size https://review.opendev.org/c/opendev/system-config/+/764019 | 16:34 |
sshnaidm | clarkb, great | 16:34 |
sshnaidm | tristanC++ | 16:35 |
clarkb | as a note we totally told peopel about this stuff like 6 weeks ago and asked for help then too :P | 16:35 |
clarkb | fungi: think we should manually apply ^ then restart then approve it? | 16:35 |
clarkb | fungi: I'm leaning towards yes | 16:35 |
fungi | clarkb: yes, i concur | 16:36 |
fungi | sshnaidm: https://review.opendev.org/Documentation/pg-plugin-dev.html is the documentation for it btw | 16:37 |
*** roman_g has joined #opendev | 16:37 | |
clarkb | fungi: ok I'm in the root screen going to edit the config now if you want to check me | 16:38 |
fungi | oh, also someone commented on the webapp route bug saying they think it's reasonable to migrate away from using /x/* for plugins... sort of not the answer i was hoping for | 16:38 |
clarkb | ya :/ | 16:39 |
clarkb | fungi: config is update if it looks good to you | 16:39 |
fungi | clarkb: yep, looks like what we discussed | 16:39 |
clarkb | fungi: should i down then up -d now? | 16:39 |
clarkb | actually maybe send a notice first? | 16:40 |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: Refresh intermediate TLS certs for testing https://review.opendev.org/c/zuul/zuul-jobs/+/764023 | 16:40 |
fungi | status notice The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an SMTP queuing backlog, downtime should be less than 5 minutes | 16:40 |
fungi | clarkb: ^ like that? | 16:40 |
* fungi was already drafting it | 16:40 | |
clarkb | fungi: ++ | 16:40 |
fungi | #status notice The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an SMTP queuing backlog, downtime should be less than 5 minutes | 16:40 |
openstackstatus | fungi: sending notice | 16:40 |
clarkb | fungi: good to down up -d now? | 16:40 |
-openstackstatus- NOTICE: The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an SMTP queuing backlog, downtime should be less than 5 minutes | 16:41 | |
fungi | yep | 16:41 |
fungi | go for it | 16:41 |
clarkb | done | 16:41 |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: Switch to container_images for push-to-intermediate-registry https://review.opendev.org/c/zuul/zuul-jobs/+/763836 | 16:42 |
clarkb | for anyone following along I think our next step is to set change.sendNewPatchsetEmails to false and restart (I think this may have increaded the amount of email we send) and if that still is sad look at cleaning up all-projects watches like what cgoncalves had | 16:43 |
clarkb | and basically monitor between each step so we can see if things are improving | 16:43 |
clarkb | show-queue is currently much happer but that is to be expected after a restart | 16:43 |
openstackstatus | fungi: finished sending notice | 16:44 |
fungi | if we disable sendNewPatchsetEmails i think we ought to consider it temporary until the all-projects watch filtering bug is fixed | 16:44 |
clarkb | fungi: well I think thati s a new feature | 16:45 |
fungi | because as i said, i wouldn't be surprised if some folks rely on e-mail notification of new patchsets for changes they're reviewing | 16:45 |
clarkb | before you only got emailed if you explicitly watched things | 16:45 |
clarkb | but I may be wrong about that | 16:45 |
fungi | oh, maybe i should reread what that option really does then | 16:45 |
clarkb | its an implied watch system for changes you own and review | 16:45 |
clarkb | I'm not 100% sure if 2.13 did that by default too | 16:45 |
clarkb | but I don't seem to remember getting those emails if it did | 16:45 |
fungi | i recall 2.13 sending e-mail notification of each new patchset uploaded, because i explicitly filtered those into a separate trash inbox | 16:46 |
*** hasharAway is now known as hashar | 16:46 | |
cgoncalves | clarkb, I can confirm that deleting *both* notifications settings I had on at the top of the list stopped the notification influx. just to reinforce: *both* need to be deleted | 16:47 |
*** roman_g has quit IRC | 16:48 | |
clarkb | cgoncalves: yup, fungi and I ended up spending some time yesterday workign to reproduce it to udnerstand it better | 16:48 |
clarkb | cgoncalves: I also believe it is an upstream bug | 16:48 |
clarkb | which I intend on filing as soon as things settle enough to do so | 16:48 |
clarkb | mwhahaha: zbr: do you notice a difference now? looking at metric data it seems happier | 16:51 |
mwhahaha | will keep an eye on it, not currently timing out | 16:52 |
mwhahaha | seems snappy, was having some hangs earlier | 16:52 |
fungi | granted, it may simply have cleaned up/disconnected/reset whatever was dragging it down, and it will begin to bog down again as it settles | 16:54 |
clarkb | ya to be seen if this was the fix we needed. I did check there are more email threads running in melody | 16:54 |
clarkb | so the config option stuck | 16:54 |
clarkb | fungi: I think we should approve the change I pushed/ | 16:55 |
clarkb | fungi: re the /x/* bug the good news is they've confirmed the minimal usage of that path so we really shouldn't have conflicts without current setup | 16:56 |
fungi | i just did | 16:56 |
clarkb | thanks! | 16:56 |
fungi | and yeah, i followed up with another comment on that bug | 16:56 |
mwhahaha | hrm tried abandoning via the ui and it's just spining | 17:00 |
*** timburke has quit IRC | 17:00 | |
mwhahaha | that took a while | 17:00 |
clarkb | hrm ya load is climing again and looks like ssh threads are busy again. So email is likely a symptom and not a cause | 17:01 |
*** rpittau is now known as rpittau|afk | 17:01 | |
tristanC | clarkb: sshnaidm: i think the work is completed for the zuul result table, https://review.opendev.org/c/opendev/system-config/+/763891 should be enough | 17:01 |
clarkb | fungi: https://gerrit-documentation.storage.googleapis.com/Documentation/2.13.13/config-gerrit.html#change doesn't show that new setting. Could still be it was unconfigurable default | 17:01 |
clarkb | tristanC: ok, we're unlikely to deploy that in the near future. We're still battling issues and its a holiday week and all that | 17:02 |
fungi | total connection count is not super high, ~128 | 17:03 |
sshnaidm | tristanC, can we include showing running jobs also? Like in https://opendev.org/x/coats/src/branch/master/coats/openstack_gerrit_zuul_status.user.js | 17:03 |
clarkb | I wonder if it is contention over locks for resources? We get queries like `gerrit query --format json --all-approvals --comments --commit-message --current-patch-set --dependencies --files --patch-sets --submit-records 737864` from a number of CI systems all asking for the same content on th esame change | 17:03 |
clarkb | and then they all go away and load drops back down again | 17:04 |
clarkb | sshnaidm: I think we should avoid that | 17:04 |
clarkb | sshnaidm: last time we tried it it killed zuul | 17:04 |
*** marios is now known as marios|out | 17:04 | |
sshnaidm | clarkb, wow | 17:04 |
clarkb | basicallylets do one thing at a time and ensure things are happy | 17:04 |
sshnaidm | clarkb, agree to try one by one | 17:04 |
sshnaidm | clarkb, I'm fine to keep using it with greasemonkey | 17:05 |
fungi | sshnaidm: we'd want to test it carefully. what tends to happen is people leave fifty gerrit changes open in different browser tabs and when a few hundred users do that all those queries beating on the zuul api eat up a ton of resources | 17:05 |
sshnaidm | fungi, I see | 17:06 |
clarkb | mwhahaha: ya looks like we're almost idle now. So it seems bursty when CI systems are fetchign chagne info. Still not sure if the CI system sare the cause (perhaps using http like zuul is better for some reason) or a symptom of something else going on | 17:06 |
clarkb | we'll have to continue to monitor it I guess | 17:06 |
mwhahaha | yea not setting gertty time outs at the moment, so it might just one of those things for now | 17:07 |
clarkb | could even be contention between email event thread locks and query locks from the CI side and flushing email more quickly results in less contention. Lots of things could be going on. Hopefully melody and further observation will allow us to better characterize it | 17:07 |
clarkb | bunch of CI queries for 763997 currently | 17:08 |
*** roman_g has joined #opendev | 17:09 | |
clarkb | if we had better coordination with CI systems it would be nice to flip them over to http and see if things change at all | 17:09 |
clarkb | in particular upstream zuul's queue lengths as reported by the dashboard are basically empty | 17:09 |
fungi | interestingly, even though the system load average is relatively high, cpu is still mostly idle | 17:09 |
clarkb | fungi: ya that is why I suspect locks? | 17:10 |
clarkb | iowait is nil | 17:10 |
clarkb | (double check me on that) | 17:10 |
fungi | it is, yes | 17:10 |
clarkb | also zuul uses http and if you look at its dashboard it reports empty queues (implying to me that it isn't waiting for long periods for this info) | 17:10 |
fungi | interrupt handling is relatively minimal as well | 17:10 |
*** roman_g has quit IRC | 17:11 | |
clarkb | I'm beginning to suspect that those ssh queries that zuul does over ssh are not very efficient for some reason | 17:11 |
fungi | still no smtp backlog, ssh connection count has climbed slightly | 17:11 |
clarkb | and melody report ssh workers are dominating the time again (and show-queue seems to imply the time there is spent doing those ci queries) | 17:12 |
fungi | sendemail threads are occasionally complaining about "Account ... has invalid filter in project watch" | 17:13 |
clarkb | the tracebacks for the ssh threads from those ci systems show they are doing external id lookups in jgit | 17:15 |
clarkb | Now I had thought that this is what the account index is for | 17:16 |
fungi | are ssh keys external ids now? | 17:16 |
clarkb | no | 17:16 |
fungi | granted the username has always been considered an external id | 17:16 |
clarkb | they are in ergular account info | 17:16 |
clarkb | this could just be looking up data to go with the comments and stuff that the CI system is querying | 17:17 |
fungi | or at least username used to be stored in the account_external_ids table in the old db | 17:17 |
clarkb | the traceback does show it going through a cache layer though | 17:17 |
clarkb | the caches are h2 db's iirc | 17:17 |
clarkb | so basically this implies to me that we aren't hitting the account index nor the cache for some reason | 17:18 |
clarkb | perhaps we're just stable beacuse we restarted? | 17:18 |
clarkb | or maybe there is a bug (and http doesn't have it and that is why zuul.opendev is happier?) and they aren't properly looking up in those areas from ssh? | 17:18 |
clarkb | it wouldn't surprise me considering that upstream doesn't enable sshd | 17:18 |
fungi | the historical system load graph in cacti indicates that load for the past two days is an order of magnitude higher than normal, and inline with when we were handling the research crawler's queries | 17:20 |
clarkb | next time we have a number of those threads I'll use melody to get a thread dump then maybe we can file a bug and see what upstream thinks | 17:20 |
fungi | or higher | 17:20 |
clarkb | but I do find it odd they seem to be spending a bunch of time with account stuff when that should all be indexed and cached. | 17:20 |
clarkb | another approach we can try is reach out to these ci systems and ask them to use http for those lookups | 17:22 |
clarkb | and see if that changes things | 17:22 |
fungi | unless we need to reindex after the file lock got unacquired for a bit yesterday and it was failing to update the index? | 17:22 |
clarkb | sean-k-mooney: ^ you're in the list :) maybe you're willing to be our first test of that | 17:22 |
clarkb | fungi: hrm that seems like it could be possible | 17:23 |
clarkb | fungi: do you want to trigger an online reindex for accounts? | 17:23 |
clarkb | https://gerrit-review.googlesource.com/Documentation/cmd-index-start.html | 17:23 |
fungi | yeah, just revisiting the syntax for it now | 17:24 |
fungi | thanks | 17:24 |
sean-k-mooney | what do i have to do? | 17:24 |
clarkb | sean-k-mooney: one sec I'll find details | 17:24 |
fungi | clarkb: "Nothing to reindex, index is already the latest version" maybe i should add --force? | 17:25 |
sean-k-mooney | is it my sean-k-mooney account or sean-mooney-ci account | 17:26 |
clarkb | sean-k-mooney: the ci account, its from zuul | 17:27 |
fungi | i went ahead and did it again with --force | 17:27 |
clarkb | fungi: k | 17:27 |
clarkb | sean-k-mooney: its doing queries on changes to see change attributes and we're theorizing that this may be slow because its bypassing indexes and caches | 17:27 |
clarkb | sean-k-mooney: but still trying to collect more data | 17:27 |
sean-k-mooney | ah i see well i stil have the ci in debug mode because i didnt get around to not doing that | 17:28 |
sean-k-mooney | would logs help? | 17:28 |
clarkb | sean-k-mooney: no I think this is normal zuul operations | 17:28 |
clarkb | sean-k-mooney: but we're thinking that if you use http to query this stuff it may be happier becauase our zuul doesn't seem to exhibit these issues (it uses http) | 17:28 |
clarkb | sean-k-mooney: trying to get our config to show you what we do | 17:29 |
sean-k-mooney | sure i can try swaping it over | 17:29 |
fungi | sean-k-mooney: https://zuul-ci.org/docs/zuul/reference/drivers/gerrit.html#attr-%3Cgerrit%20connection%3E.password | 17:29 |
clarkb | oh thanks | 17:29 |
clarkb | apparently our config is secret beacuse it has secrets | 17:29 |
clarkb | that is why I couldn't find it on gitea | 17:30 |
sean-k-mooney | we need to use basic auth or digest? | 17:30 |
fungi | basic now | 17:30 |
sean-k-mooney | and just remoe the sshkey line | 17:30 |
clarkb | sean-k-mooney: no, it will still use ssh for other things | 17:30 |
clarkb | like stream events | 17:30 |
fungi | i think you still need the sshkey for the event streamm right | 17:31 |
clarkb | but it will prefer http for everything that it can do over http | 17:31 |
sean-k-mooney | ah ok | 17:31 |
fungi | yeah, gerrit has never added an event stream over http | 17:31 |
fungi | though the checks plugin was sort of getting there | 17:32 |
clarkb | fungi: looks like show queue shows the reindex is done? I wonder if the log will tell us if it was happy | 17:33 |
fungi | [2020-11-24T17:32:19.832+0000] [Reindex accounts v11-v11] INFO com.google.gerrit.server.index.OnlineReindexer : Reindex accounts to version 11 complete | 17:33 |
fungi | [2020-11-24T17:32:19.832+0000] [Reindex accounts v11-v11] INFO com.google.gerrit.server.index.OnlineReindexer : Using accounts schema version 11 | 17:33 |
clarkb | ya just found it too | 17:33 |
fungi | looks right | 17:33 |
clarkb | cool, I guess we see if those tracebacks come back | 17:34 |
sean-k-mooney | full reconfigure enough or should i do a full service restart | 17:34 |
clarkb | er thread states with tracebacks in melody | 17:34 |
clarkb | sean-k-mooney: I think a restart for anything in the ini config | 17:34 |
clarkb | reconfigure will do the yaml side | 17:34 |
sean-k-mooney | cool ill do that so | 17:34 |
clarkb | I'm being dragged to breakfast. I'ev got the pre reindex thread dumps in my browser and can get post redinex threaddumps if they persist | 17:34 |
sean-k-mooney | i can pick up my debug logs changes too then :) | 17:34 |
clarkb | but then maybe file the cgoncalves bug, this issue, then meeting | 17:35 |
clarkb | sean-k-mooney: if you haven't restarted yet maybe wait a minute | 17:35 |
clarkb | I can double check our config for any other flags that may be necessary | 17:36 |
clarkb | nope looks like the http token is the only thing | 17:36 |
sean-k-mooney | i have jsut stop the scheduler | 17:36 |
clarkb | no worries I think you're good | 17:36 |
sean-k-mooney | i can put my config in an etherpad if you want to edit it | 17:37 |
clarkb | no I think that was all you need | 17:37 |
clarkb | I just couldn't remmeber if there was another flag to say prefer http | 17:37 |
clarkb | but that config option does it | 17:37 |
sean-k-mooney | cool ill start it again so . i did not see anything else that seamed related | 17:37 |
clarkb | ++ | 17:37 |
clarkb | and now really steping away for sustenance | 17:38 |
sean-k-mooney | should be running let me know if you want me to do anything else | 17:40 |
sean-k-mooney | it might be just becasue its starting up but the log output seams faster as if its processing updates to the repos faster | 17:41 |
sean-k-mooney | oh its used for comment reporting and is need for the line in file reprotign like the pep8 job does cool | 17:45 |
fungi | yup | 17:47 |
fungi | looks like system load average on review.o.o is climbing again | 17:49 |
fungi | and now falling again. that was relatively brief | 17:49 |
sean-k-mooney | did ye do a restart a few minutes ago | 17:50 |
sean-k-mooney | or was that an hour ago | 17:50 |
*** marios|out has quit IRC | 17:50 | |
fungi | sean-k-mooney: at 16:41z so yes a little over an hour ago | 17:51 |
sean-k-mooney | ok was going to sugget maybe it was cis reconnecting or soemthing | 17:51 |
sean-k-mooney | zuul is quite chatty as its loading all its config initally | 17:52 |
sean-k-mooney | after that its basically just reactive to the gerrit event stream | 17:52 |
*** iurygregory has quit IRC | 17:59 | |
*** dtantsur is now known as dtantsur|afk | 18:01 | |
fungi | well, we were seeing it before the restart too. one of the things we were hoping might clear up with the config change we made to increase smtp threads | 18:02 |
*** fressi has joined #opendev | 18:05 | |
*** mgoddard has quit IRC | 18:07 | |
tristanC | sshnaidm: i think that can be added, but right now the implementation is really simple, it just display build result in a table under the commit message from the zuul comment | 18:09 |
sshnaidm | tristanC, that's fine, I think we can tweak it to work with your plugin together | 18:09 |
tristanC | though we don't know where is the zuul rest api from the gerrit comment only, so to display build progress bar, we need a configuration option to indicate where zuul is running | 18:09 |
sshnaidm | tristanC, anyway I use it in browser scripts | 18:10 |
tristanC | i wrote that zuul-results plugin using the https://gerrit-review.googlesource.com/Documentation/js-api.html api | 18:10 |
fungi | clarkb: load average has bottomed out (i saw it below 4 a few moments ago). noticing that show-queue -w does include the usernames at the ends of queries so can provide a bit of a snapshot of what's interacting at any point in time. arista-test seems to be almost always present | 18:13 |
fungi | looks like it might be cloning nova? | 18:13 |
clarkb | ya that looks like a clone | 18:15 |
fungi | that accounts had a "git-upload-pack /openstack/nova.git" for 8 minutes already according to the queue timestamps | 18:15 |
clarkb | nova is >1GB now iirc. Hopeflly they're caching it | 18:15 |
clarkb | fungi: I wonder then if the account index lock issue was kick us out to direct access on those queries if things are happier now | 18:15 |
fungi | gerrit show-queue -w|grep ')$'|sed 's/.* (\(.*\))$/\1/'|sort|uniq -c|sort -n | 18:15 |
fungi | provides an interesting view on account use at different points in time | 18:16 |
fungi | okay, arista-test finished the nova clone | 18:16 |
fungi | or it was terminated somehow | 18:16 |
clarkb | melody does seem to indicate things are largely happy? | 18:17 |
fungi | no user-related tasks in the queue a moment ago when i checked | 18:17 |
clarkb | I'll remove myself from admins now | 18:17 |
clarkb | and if this continues to be happy we can attach the thread dumps I got to the account index bug? | 18:17 |
fungi | worried this is just a sample of what we'll see next week when so many people aren't vacationing | 18:18 |
clarkb | fungi: based on zuul queues we're pretty active right now | 18:18 |
clarkb | 133 chagnes in check and 43 in gate | 18:18 |
clarkb | thats pretty normal | 18:18 |
clarkb | but also I can't tell if you think things are happier now or less happy :) | 18:19 |
*** andrewbonney has quit IRC | 18:19 | |
fungi | i try not to get my hopes up ;) | 18:20 |
clarkb | if it spieks again I think the thing to do is get into melody and get the threads as a txt dump | 18:20 |
clarkb | that includes the stacks for each thread and you can see if we're spending time in jgit for externalids again | 18:20 |
clarkb | but doesn't seem like we're doing thatn ow at least so maybe that issue was resolved by the reindex | 18:20 |
fungi | mmm... maybe. i did observe system load climb up to almost 40 around 17:49z but only for a few minutes | 18:23 |
clarkb | fungi: I put the stacks in review.o.o:~clarkb/gerrit-3.2-slow-external-id-lookups for when I caught it an hour ago or whatever it was | 18:25 |
clarkb | my hunch since things have been quieter since is that the account reindex did help | 18:25 |
*** yumiriam has joined #opendev | 18:26 | |
clarkb | and if that holds up I think we clean up ^ that file to remove account info and then attach it to your existing bug as a related thing. If we see the issue return without index problems then we create a new bug? | 18:26 |
fungi | also doesn't seem to have lost the lock on the accounts index again yet | 18:26 |
*** iurygregory has joined #opendev | 18:27 | |
clarkb | because ya I wonder if that state makred the index and cache as dirty so we went to disk for everything | 18:27 |
fungi | yeah, i suppose it could be helpful on the existing bug, though it's more of a symptom than anything | 18:27 |
clarkb | and it was probably locking around those reads | 18:27 |
fungi | though if so, it's surprising that reindex without --force said there was nothing to do | 18:27 |
clarkb | fungi: I think that is only checking if the index itself says it is done | 18:27 |
openstackgerrit | Merged opendev/system-config master: Increase gerrit sendemail thread pool size https://review.opendev.org/c/opendev/system-config/+/764019 | 18:32 |
*** yumiriam has quit IRC | 18:36 | |
clarkb | https://bugs.chromium.org/p/gerrit/issues/detail?id=13733 has been filed upstream about cgoncalves issue | 18:37 |
clarkb | fungi: re the index force thing I'm fairly certain it is just checking the schema version | 18:40 |
*** eolivare_ has quit IRC | 18:45 | |
fungi | ahh, okay | 18:48 |
*** whoami-rajat__ has quit IRC | 18:53 | |
openstackgerrit | Alec Hothan proposed openstack/project-config master: Enable python 3.6 job only for x/vmtp project. https://review.opendev.org/c/openstack/project-config/+/764054 | 19:19 |
openstackgerrit | Merged zuul/zuul-jobs master: Use Python 3.x with launchpadlib https://review.opendev.org/c/zuul/zuul-jobs/+/763834 | 19:23 |
openstackgerrit | Merged opendev/gerritlib master: Handle empty reads as closed connections https://review.opendev.org/c/opendev/gerritlib/+/763892 | 19:28 |
*** DSpider has quit IRC | 19:34 | |
clarkb | fungi: arista seems to be cloning nova again | 19:37 |
clarkb | I wonder if they are doing authenticated clones to gerrit for every job or something like that | 19:37 |
fungi | wouldn't surprise me | 19:38 |
ianw | it's definitely these ssh threads | 19:40 |
ianw | if you strace on one of the threads you can see what it's up to | 19:40 |
ianw | stat("/var/gerrit/git/All-Users.git/HEAD", {st_mode=S_IFREG|0644, st_size=22, ...}) = 0 | 19:44 |
ianw | read(372, "ref: refs/meta/config\n", 22) = 22 | 19:44 |
ianw | read(372, "", 22) = 0 | 19:44 |
ianw | close(372) = 0 | 19:44 |
fungi | so i guess it's busy mapping user info for the change details? | 19:45 |
ianw | ... over and over and over | 19:45 |
clarkb | ianw: if you can characterize it even a little bit I bet that wouldbe worth an upstream bug | 19:48 |
clarkb | ianw: also call out that similar queries from CI systems over http don't seem to have this issue | 19:48 |
clarkb | maybe ssh is simply bypassing a cache that it shouldn't | 19:48 |
clarkb | and upstream doesn't notice because they disable ssh | 19:48 |
mnaser | are we talking about potentially an issue with slow gerrit ops? | 19:49 |
clarkb | mnaser: we've been talking about it all day :P | 19:49 |
clarkb | mnaser: the tldr seems to be that ci systems doing change info lookups over ssh (but not http) induces slowness | 19:49 |
mnaser | clarkb: ah, that would be a hard one to sort out... | 19:49 |
clarkb | and that seems to be related to the ssh path doing al ot more git lookups than the http one | 19:50 |
clarkb | so our zuul does a lookup quickly and its fine. But third party CIs dont | 19:50 |
mnaser | for me the symptoms i notice is on what seems to be "write" ops, they seem to take a while | 19:50 |
mnaser | so viewing a change is fine, but reviewing one / abandoning / etc all take time | 19:50 |
clarkb | mnaser: I think locking is invovled. We see basically no iowait and low cpu use for the system load but system load ends up fairly high | 19:51 |
clarkb | and for reading you probably don't need locks | 19:51 |
clarkb | for writing I bet it does | 19:51 |
clarkb | s/for the system load/compared to system load/ | 19:51 |
mnaser | is gerrit running on ssds? | 19:52 |
clarkb | mnaser: it is running on an "ssd" cinder volume | 19:52 |
clarkb | so as far as we are able to say: yes | 19:52 |
mnaser | gotcha | 19:52 |
mnaser | haha :) | 19:52 |
fungi | openstack doesn't lie | 19:52 |
clarkb | mnaser: fwiw you reviewed the cahgne that is causing all the problem right now | 19:52 |
clarkb | :) | 19:52 |
clarkb | https://review.opendev.org/c/openstack/magnum/+/763997/ that one is what a number of ci systems are doing lookups against | 19:53 |
clarkb | unfortunately the pushing of three patchsets in quick succession there made it worse | 19:53 |
mnaser | i wonder if ci systems are doing this lookup on every comment | 19:53 |
clarkb | beacuse each one has to be queries | 19:53 |
clarkb | mnaser: no its the patchset created events | 19:53 |
clarkb | I think, Ihaven't fully tracked it back to the zuul code | 19:53 |
mnaser | wow so a whole 20 minutes after | 19:53 |
clarkb | gerrit query --format json --all-approvals --comments --commit-message --current-patch-set --dependencies --files --patch-sets --submit-records | 19:54 |
clarkb | mnaser: ya because its doing it for each patchset | 19:54 |
clarkb | and each one is slow | 19:54 |
mnaser | we probably don't really have a record of how quick this thing was before | 19:54 |
mnaser | cause we never used it anyways.. | 19:54 |
clarkb | it? | 19:54 |
mnaser | ssh lookups | 19:54 |
clarkb | they were always used before by these CI systems | 19:54 |
mnaser | you mentioned opendev's zuul relied on http looups | 19:54 |
mnaser | lookups* | 19:54 |
clarkb | but the bulk of this info was coming out of the sql database | 19:55 |
clarkb | so the code paths are compltely different now | 19:55 |
mnaser | wouldn't both ssh and http implementations slow down though if notedb was the case? | 19:55 |
mnaser | (sorry, i'm sure this conversation has already been had today) | 19:55 |
mnaser | (this is for my selfish curiosity so if y'all are busy debugging, feel free to leave all my questions for later :p) | 19:56 |
fungi | yes recent zuuls configured with an http password don't query change details over ssh, but those configured without one or zuul v2 or other ci systems which only used ssh seem to be what's putting heavy load on things currently | 19:56 |
mnaser | s/case/cause/ | 19:56 |
clarkb | mnaser: my hunch is the http path is using cache and indexes more aggressively | 19:56 |
clarkb | mnaser: using that example change though our zuul started running jobs for ps4 almost immediately | 19:57 |
fungi | and it looks like it's because queries via ssh are causing direct reads to notedb to get things like user details to include in the query results, while querying that over rest api seems to use caches | 19:57 |
clarkb | whereas based on the gerrit show queue output the other cis are still cranking away at it | 19:57 |
clarkb | also upstream doesn't use ssh | 19:57 |
fungi | they don't even expose an ssh socket | 19:57 |
clarkb | so this is one of those areas where it would likely be easy for them to miss regressions. Our best bet is to characterize it as well as we can and file a bug/start a mailing list thread | 19:58 |
fungi | or if they do it's very selective (maybe the upstream gerrit's zuul is still using stream-events?) | 19:58 |
mnaser | perhaps pinging our friends at gerrithub if they have that option enabled :) | 19:58 |
clarkb | and in the meantime if people want to reach out to third party ci systems and suggest they use the http functionality that might be a good thing | 19:58 |
fungi | (or is it relying entirely on queues in the checks plugin?) | 19:58 |
clarkb | fungi: it is relying entirely on checks plugins | 19:58 |
clarkb | fungi: gogoel doesn't allow ssh is the issue aiui | 19:58 |
fungi | ahh, okay, so then yes i guess upstream gerrit's gerrit has no ssh allowed at all | 19:59 |
clarkb | correct | 19:59 |
clarkb | when you psuh code you set up "git cookies" and that authenticates you via http | 19:59 |
mnaser | so there's https://github.com/GerritCodeReview/gerrit/blob/stable-3.3/java/com/google/gerrit/server/query/change/OutputStreamQuery.java | 19:59 |
fungi | out of curiosity how do they serve the commti hook to add change ids? | 19:59 |
mnaser | `final QueryStatsAttribute stats = new QueryStatsAttribute();` | 20:00 |
mnaser | it looks like gerrit is collecting 'query stats' | 20:00 |
mnaser | maybe if we can find a way to access those | 20:00 |
clarkb | fungi: http (its always been that way iirc) | 20:00 |
fungi | interesting. git-review always fetched it via scp | 20:00 |
mnaser | seems like the query stats are only printed out in the response itself | 20:00 |
clarkb | fungi: I used git review to get it from upstream gerrit yseterday | 20:01 |
fungi | maybe i've forgotten adding http support for fetching that commit hook | 20:01 |
clarkb | ya git review supports http and https for it | 20:02 |
clarkb | /tools/hooks/commit-msg is the path for it via http according to git-review | 20:02 |
mnaser | do we have any plugins inside our gerrit install? | 20:02 |
fungi | indeed, set_hooks_commit_msg() does try via http/https first | 20:03 |
fungi | and falls back to scp now | 20:03 |
clarkb | mnaser: a small number, mostly official | 20:03 |
clarkb | its all in the gerrit job info | 20:04 |
clarkb | s/gerrit/zuul/ | 20:04 |
mnaser | i see, looking at this output stream query stuff, it seems to loop every change through plugins that can 'augment' the info | 20:04 |
mnaser | i guess we can compare https://github.com/GerritCodeReview/gerrit/blob/stable-3.3/java/com/google/gerrit/sshd/commands/Query.java to https://github.com/GerritCodeReview/gerrit/blob/stable-3.3/java/com/google/gerrit/server/restapi/change/QueryChanges.java | 20:06 |
clarkb | I would compare stable-3.2 but ya that would be one appraoch | 20:06 |
mnaser | it looks like it is a similar api | 20:07 |
mnaser | `processor.query(join(query, " "));` vs `queryProcessor.query(qb.parse(queries));` | 20:08 |
mnaser | so i guess maybe caching is happening at a higher layer (or the old query hits harder and is maybe not as efficent?) | 20:08 |
mnaser | i dunno, i don't understand it all enough, but maybe a little bit of digging here helped :< | 20:09 |
*** sboyron has quit IRC | 20:09 | |
clarkb | mnaser: ya I do wonder if the caching might be happening in http sessions tate or somethign like that | 20:12 |
*** hamalq has joined #opendev | 20:16 | |
*** sboyron has joined #opendev | 20:18 | |
hashar | mnaser: about the QueryStats / OutputStreamQuery.java , that is Gerrit measuring how long it took to process a query which is reported back to the user in the response payload as "runTimeMilliseconds" | 20:27 |
hashar | for queries over ssh there should be some timing stats added in the sshd_log | 20:28 |
clarkb | hashar: oh thanks for that pointer that may be useful | 20:28 |
hashar | git-upload-pack responses have finely grained details about the time it took. A series of -1 -1 -1 .. indicates a full clone, else that is a fetch of some sort | 20:28 |
hashar | do you happen to have a prometheus collector and the metrics-reporter-prometheus plugin ? | 20:29 |
hashar | (to be fair I only deployed that a few days ago for wikimedia but that has been a life changer!) | 20:30 |
hashar | we now have history of caches misses ( https://grafana.wikimedia.org/d/vHQVaGsWk/caches-upstream?orgId=1 ) | 20:30 |
hashar | and even query latencies https://grafana.wikimedia.org/d/IDkFbYeWz/latency-upstream?viewPanel=6&orgId=1&refresh=1m | 20:30 |
hashar | the devil is of course to find out what is being crippled exactly :-\ | 20:32 |
frickler | clarkb: fungi: fwiw I'm still missing gerrit mails with zuul responses, e.g. for https://review.opendev.org/c/openstack/project-config/+/763888 I got a mail with the recheck, but not with the zuul +1 | 20:39 |
fungi | possible that was lost in the queue when we restarted? | 20:40 |
fungi | ahh, no, that's more recent than the restart | 20:40 |
fungi | frickler: maybe gerrit doesn't send e-mails for bot comments? | 20:41 |
clarkb | do your project watches have a query maybe? oh thats another good theory too | 20:41 |
fungi | zuul specially marks its comments/votes as automated, and i wouldn't be surprised if gerrit intentionally leaves out notifications for those | 20:41 |
fungi | frickler: you might try adjusting your preferences for "email notifications" to the "every comment" option | 20:44 |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: Switch to container_images for push-to-intermediate-registry https://review.opendev.org/c/zuul/zuul-jobs/+/763836 | 20:44 |
sean-k-mooney | o/ did swappign to http have any effect? | 20:46 |
frickler | fungi: I have all notifications enabled. I did get mails with zuul votes earlier, also after the upgrade, but it seems to have stopped sometime between yesterday and today | 20:46 |
clarkb | sean-k-mooney: yes, and your CI system should be happier | 20:47 |
clarkb | sean-k-mooney: ianw in particualr is still digging in but getting close to understanding this we think | 20:47 |
openstackgerrit | Daniel Blixt proposed zuul/zuul-jobs master: Use script to populate test file tree fixtures https://review.opendev.org/c/zuul/zuul-jobs/+/764062 | 20:48 |
sean-k-mooney | clarkb: updating refs in the git repos seam to be faster | 20:53 |
sean-k-mooney | that just anicdotial since i have not messgured it but the logs were updating much faster after a gerrit event | 20:54 |
*** hamalq has quit IRC | 20:55 | |
*** hashar has quit IRC | 20:59 | |
*** hamalq has joined #opendev | 21:10 | |
*** hamalq has quit IRC | 21:15 | |
*** ralonsoh has quit IRC | 21:17 | |
*** sboyron has quit IRC | 21:34 | |
*** sboyron has joined #opendev | 21:34 | |
*** sboyron has quit IRC | 21:58 | |
*** sboyron has joined #opendev | 21:59 | |
fungi | clarkb: *sigh* https://gerrit-review.googlesource.com/289842 | 22:01 |
clarkb | fungi: I mean I guess they can deprecate it and we cam keep pushing the broader issue | 22:02 |
openstackgerrit | Alec Hothan proposed openstack/project-config master: Enable python 3.6 job only for x/vmtp project. https://review.opendev.org/c/openstack/project-config/+/764054 | 22:03 |
fungi | yeah | 22:06 |
*** sboyron has quit IRC | 22:11 | |
*** sboyron has joined #opendev | 22:11 | |
*** sboyron has quit IRC | 22:12 | |
*** sboyron has joined #opendev | 22:13 | |
*** jentoio has quit IRC | 22:14 | |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: Switch to container_images for push-to-intermediate-registry https://review.opendev.org/c/zuul/zuul-jobs/+/763836 | 22:14 |
*** sboyron has quit IRC | 22:15 | |
*** sboyron has joined #opendev | 22:16 | |
*** sboyron has quit IRC | 22:20 | |
*** sboyron has joined #opendev | 22:20 | |
ianw | clarkb: if you have a quick sec, would you mind a run over https://review.opendev.org/q/topic:dib-3.4.0 which is two little fixes; one for the kubernetes job and then bump dib dependency. this is required to get centos builds working again so i can put nb01/02 back in operation | 22:24 |
*** sboyron has quit IRC | 22:25 | |
clarkb | I'll try | 22:26 |
*** sboyron has joined #opendev | 22:26 | |
*** jmorgan has joined #opendev | 22:29 | |
*** sboyron has quit IRC | 22:33 | |
*** sboyron has joined #opendev | 22:34 | |
ianw | ok, back to gerritbot | 22:38 |
*** sboyron has quit IRC | 22:40 | |
*** sboyron has joined #opendev | 22:41 | |
openstackgerrit | Ian Wienand proposed opendev/gerritbot master: Build against gerritlib master branch https://review.opendev.org/c/opendev/gerritbot/+/763927 | 22:42 |
ianw | fungi: ^ was just missing the required-projects on the upload step | 22:45 |
fungi | oh, heh | 22:46 |
fungi | that makes sense | 22:46 |
*** sboyron has quit IRC | 22:48 | |
*** sboyron has joined #opendev | 22:48 | |
*** sboyron has quit IRC | 22:57 | |
*** sboyron has joined #opendev | 23:07 | |
*** slaweq has quit IRC | 23:09 | |
openstackgerrit | Merged opendev/gerritbot master: Build against gerritlib master branch https://review.opendev.org/c/opendev/gerritbot/+/763927 | 23:11 |
*** slaweq has joined #opendev | 23:28 | |
*** slaweq has quit IRC | 23:33 | |
*** hamalq has joined #opendev | 23:37 | |
*** hamalq has quit IRC | 23:41 | |
*** hamalq has joined #opendev | 23:52 | |
*** tosky has quit IRC | 23:57 | |
ianw | going to try the gerritbot container that should have uploaded with master gerritlib | 23:57 |
*** hamalq has quit IRC | 23:57 | |
*** openstackgerrit has quit IRC | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!