*** tosky has quit IRC | 00:06 | |
ianw | ok, just going to log a few things as i investigate nodepool | 00:15 |
---|---|---|
ianw | (CONNECTED [localhost:2181]) /nodepool/images/debian-stretch/builds> json_cat 0000130391 | 00:15 |
ianw | is blank | 00:15 |
ianw | this is what is causing dib-image-list to bail | 00:15 |
ianw | the node before that, 0000130390 is complete | 00:15 |
clarkb | ianw: iirc the blank entries is a known issue that bmw has been looking at figuring out | 00:16 |
clarkb | and it causes things to not delete either | 00:16 |
clarkb | which may also explain the disk use | 00:16 |
ianw | clarkb: yeah, that's where i think i'm coming at, we have a few old images from like aug that i think are orphaned | 00:17 |
ianw | 11/14/2020 @ 11:20am (UTC) is date from 130390, i wonder if our logs go back that far | 00:23 |
fungi | log of the maintenance can be found here: http://eavesdrop.openstack.org/meetings/opendev_maint/2020/opendev_maint.2020-11-20-12.59.log.html | 00:30 |
fungi | including that in a wrap-up e-mail to the service announce list shortly | 00:30 |
ianw | 130391 is i'm sure somehow related to a whole bunch of "keystoneauth1.exceptions.connection.ConnectTimeout: Request to https://api.us-east.open-edge.io:5000/v3/auth/tokens timed out" | 00:33 |
clarkb | the router died so we may need to disable it more | 00:34 |
clarkb | I think that is why cloud launcher is failing too but I havne't had a chance to check | 00:34 |
fungi | service-announce e-mail sent, will work on the status notice next | 00:41 |
clarkb | I see it | 00:43 |
ianw | what was the tl;dr on statusbot? i just uploaded a project-config change that hasn't been announced | 00:45 |
clarkb | ianw: you mean gerritbot? | 00:45 |
ianw | sorry yeah | 00:45 |
clarkb | ianw: gerritlib is reading empty messages for some reason, it then fails to convert them to json and breaks. Fungi has been digging in with some modified code on the server and extra logging | 00:45 |
clarkb | the zuul scheduler saw similar but only when we restarted gerrit | 00:46 |
clarkb | that makes us think that maybe gerritbot is losing connectivity for some reason and reads are short or bad | 00:46 |
ianw | yeah the last entry is @ Nov 23 00:17:20 eavesdrop01 docker-gerritbot[1386]: 2020-11-23 00:17:20,239 DEBUG paramiko.transport: EOF in transport thread | 00:46 |
clarkb | one idea I threw out was maybe its a paramiko verison thing | 00:46 |
clarkb | and comparing zuul and gerritbot might be worthwhile for cryptogrpahy and paramiko package differences? | 00:47 |
clarkb | (since zuul appers to be fine) | 00:47 |
fungi | sounds like paramiko is getting disconnected and not handling it well? | 00:47 |
clarkb | (note that zuul doesn't use gerritlib but does vendor in code that is very similar) | 00:47 |
ianw | 29472 root 20 0 346724 22192 0 S 99.3 2.2 25:13.27 gerritbot | 00:47 |
ianw | it's gone a bit nuts and is taking up 100% cpu | 00:48 |
fungi | not surprising, before i added a check that readline() returned something truthy, it was going into fits of reading thousands of zero-length strings per second | 00:49 |
ianw | poll([{fd=6, events=POLLIN|POLLPRI|POLLOUT}], 1, -1) = 1 ([{fd=6, revents=POLLIN}]) | 00:50 |
ianw | umb-init 29459 root 0u CHR 1,3 0t0 6 /dev/null | 00:50 |
fungi | before i began to hack on it, the process was crashing on a readline() which returned something the json module couldn't parse, in retrospect seems to have been a null string | 00:50 |
ianw | oh, no i've read it wrong | 00:50 |
ianw | paramiko 2.7.1 | 00:51 |
ianw | upstream is at 2.7.2 ... so it's recent | 00:51 |
clarkb | 2.7.2 fixed some bugs though | 00:52 |
fungi | #status notice Our Gerrit upgrade maintenance has concluded successfully; please see the maintenance wrap-up announcement for additional details: http://lists.opendev.org/pipermail/service-announce/2020-November/000014.html | 00:52 |
openstackstatus | fungi: sending notice | 00:52 |
clarkb | http://www.paramiko.org/changelog.html | 00:52 |
clarkb | and I bet zuul is on 2.7.2 because we rebuild that image often? | 00:52 |
-openstackstatus- NOTICE: Our Gerrit upgrade maintenance has concluded successfully; please see the maintenance wrap-up announcement for additional details: http://lists.opendev.org/pipermail/service-announce/2020-November/000014.html | 00:52 | |
ianw | 2.7.1 -> 2.7.2 doesn't look like anything relevant; a string format fix and something about ssh rsa key loading | 00:53 |
clarkb | might also be easier to do a local test harness if it is reproduceable | 00:53 |
clarkb | but then can narrow it down hopefully | 00:54 |
ianw | i can try and get a backtrace | 00:54 |
clarkb | there were also some recent gerritlib updates around the gerritwatcher iirc | 00:55 |
clarkb | I guess we can't rule out that they were fine with 2.13 but not 3.2? | 00:55 |
clarkb | I dunno brain is tired | 00:55 |
openstackstatus | fungi: finished sending notice | 00:55 |
clarkb | cool ^ I think the is my cue to enjoy the evening | 00:56 |
fungi | and with that, i need to knock off as well | 00:56 |
ianw | ++ thanks! | 00:56 |
ianw | i'll see what i can come up with for gerritbot, and nodepool-builders are turned off and frankly a mess in ZK it seems | 00:56 |
ianw | i'll try and get that going too | 00:56 |
clarkb | thank you | 00:56 |
fungi | this wouldn't be the first time we needed to manually delete null nodes out of zk | 00:57 |
ianw | i feel like the missing openedge is a part of it; proposed https://review.opendev.org/c/openstack/project-config/+/763676 to remove uploads | 00:58 |
fungi | ianw: approved, but i expect that can leave cruft in zk for the uploaded image records | 01:01 |
ianw | hrm, i guess the gerritbot container isn't privileged so i can't attach a debugger to it to get a bt | 01:02 |
fungi | for what it's worth, i've been editing /var/lib/docker/aufs/diff/876e0b03c6ccb1863c0ef28fd27a9984845700722cd4feb1e2330cdf9f37a7d9/usr/local/lib/python3.7/site-packages/gerritlib/gerrit.py to add debugging and then downing and upping the container | 01:03 |
fungi | i also added eavesdrop to the emergency disable list so ansible wouldn't undo the debug level i set on handler_console in /etc/gerritbot/logging.conf | 01:04 |
fungi | this is probably all proof that i'm bad at containering, apologies in advance | 01:05 |
ianw | ... i could chroot in there, then run gdb on the process ... | 01:05 |
fungi | and remember that we're using released versions of gerritbot in that container, there are some refactors in mastr | 01:06 |
fungi | er, in master | 01:06 |
fungi | er, and gerritlib | 01:06 |
fungi | released versions of gerritLIB | 01:06 |
fungi | i need sleep | 01:06 |
ianw | alright, attaching gdb is a futile exercise because it uses the /usr/local/bin/python , which has no symbols | 01:21 |
ianw | despite chrooting into the container working (i think the "real" way to do this is to start a separate container in the same namespace) | 01:22 |
*** hamalq has joined #opendev | 01:32 | |
*** hamalq has quit IRC | 01:37 | |
ianw | fungi: in a root screen on eavesdrop i have gerritbot now running under a gdb we can break into a get a python backtrace with py-bt | 01:58 |
ianw | basically it's running a container; i've installed the debian versions of python (so we have symbols), added site-packages to the path (so the deb python3.7 finds the pip installed libraries) and run it manually under gdb | 01:59 |
ianw | so if it goes bananas again, hopefully we can ctrl-c, "py-bt" and have a pretty good idea of what's going on | 01:59 |
ianw | ok, with that in monitoring phase, back to the nodepool builders. i'm going to emergency them so the containers stop for good | 02:06 |
mordred | ianw: you should be able to docker exec into the running container instead of needing to chroot in to it. if you docker exec $containerid bash - you'll get a back in the existing container context | 02:17 |
mordred | just - fwiw | 02:17 |
mordred | fungi: ^^ | 02:17 |
ianw | mordred: yeah, that was a total hack, because that container wasn't started with permissions gdb inside it couldn't attach. so i chrooted into the container outside as root to run the gdb from the container, but in a context it could ptrace | 02:18 |
ianw | into the container fs i mean | 02:18 |
mordred | hah - awesome | 02:18 |
ianw | but ... that didn't work anyway because it's running under the python-slim /usr/local/bin/python, which has no symbols | 02:19 |
ianw | i've dropped /root/ianw-notes.txt on this, or screen number 2 in the screen session | 02:19 |
fungi | thanks | 02:26 |
ianw | ok, i'm back to trying to figure out how to cleanup nodepool | 02:44 |
ianw | image-list shows all these failed uploads | 02:45 |
ianw | i feel like if i just delete all the debian-stretch builds, it should start fresh | 02:45 |
*** ykarel has joined #opendev | 03:03 | |
ianw | i'm starting nb01 and seeing if it builds centos-7, which is very old | 03:04 |
ianw | ok, we need a dib release with Ib292b0b2b31bd966e0c5e8f2b2ce560bba89c45c for centos7 | 03:29 |
*** hamalq has joined #opendev | 03:33 | |
*** hamalq has quit IRC | 03:37 | |
ianw | fungi: ok, looks like it died | 03:39 |
ianw | 2020-11-23 02:57:16,501 DEBUG paramiko.transport: EOF in transport thread | 03:40 |
ianw | [Thread 0x7fffeffff700 (LWP 750) exited] | 03:40 |
ianw | so, straight after that paramiko.transport debug message the thread exits, and that's when it goes haywire | 03:52 |
ianw | i've restarted it with a breakpoint on pthread_exit for the ssh thread ... see if we can get something interesting there | 03:52 |
*** ykarel has quit IRC | 04:09 | |
*** raukadah is now known as chandankumar | 04:46 | |
*** ykarel has joined #opendev | 05:15 | |
*** openstackgerrit has joined #opendev | 05:15 | |
openstackgerrit | Merged openstack/diskimage-builder master: Fix dynamic-login with grub2 https://review.opendev.org/c/openstack/diskimage-builder/+/763566 | 05:15 |
openstackgerrit | Merged openstack/diskimage-builder master: Fix python-stow-versions https://review.opendev.org/c/openstack/diskimage-builder/+/751610 | 05:16 |
*** sgw has joined #opendev | 05:27 | |
openstackgerrit | Merged opendev/system-config master: codesearch: Add robots.txt https://review.opendev.org/c/opendev/system-config/+/763499 | 05:41 |
*** ysandeep|off is now known as ysandeep | 05:54 | |
*** danpawlik has quit IRC | 06:24 | |
*** danpawlik has joined #opendev | 06:24 | |
openstackgerrit | Merged opendev/system-config master: Clean up cron tab entry from ansible once removed from host https://review.opendev.org/c/opendev/system-config/+/758599 | 06:42 |
*** whoami-rajat__ has joined #opendev | 06:46 | |
*** sboyron has joined #opendev | 06:52 | |
ianw | infra-root: builder status update -- I have manually cleared out all the failed things from zookeeper that i think started with the disappearance of openedge and nodepool was not dealing with | 07:05 |
ianw | i have removed all orphaned images from /opt on nb01 and it is slowly building | 07:05 |
ianw | centos-7 is currently failing to build and will need a dib release + nodepool-image update to work. i have been through dib reviews and am merging a few outstanding things before a release | 07:06 |
ianw | centos-7 is paused in nb01 | 07:06 |
*** iurygregory has joined #opendev | 07:06 | |
ianw | both nb01 and nb02 are in emergency, and only builder is running on nb01 | 07:06 |
ianw | i don't know what's special about rax, but they have a bunch of images in the alien list. i'll clean those up tomorrow | 07:07 |
*** lpetrut has joined #opendev | 07:07 | |
ianw | centos-8-stream also failed, and i need to look into that | 07:10 |
openstackgerrit | Dong Ma proposed openstack/project-config master: Remove ceilometer-zvm entry https://review.opendev.org/c/openstack/project-config/+/763742 | 07:12 |
*** DSpider has joined #opendev | 07:17 | |
*** rpittau|afk is now known as rpittau | 07:29 | |
frickler | zigo: do you know how to write a watch file that doesn't prefer pre-releases for pypi projects? see https://pypi.debian.net/git-review if I run uscan for that, I get 1.28.0.0a1 instead of 1.28.0 | 07:36 |
frickler | (context: we need 1.27.0 or newer in order to work properly with our updated gerrit) | 07:36 |
*** ralonsoh has joined #opendev | 07:39 | |
*** eolivare has joined #opendev | 07:53 | |
ianw | frickler: hey, just wanted to call out prior conversation that gerritbot is currently running "manually" in a screen session on eavesdrop | 07:57 |
ianw | it's running under gdb; if it stops, it would be great to get a py-bt ... it should *hopefully* catch the ssh communication thread exiting and hopefully give us a clue as to why it does that | 07:58 |
ianw | there's a few notes in /root/ianw-notes.txt | 07:58 |
ianw | just in case as people wake up and start complaining :) | 07:59 |
*** slaweq has joined #opendev | 08:01 | |
cgoncalves | hey folks! it looks like the Gerrit update went smooth for the most part? well done, team! | 08:13 |
cgoncalves | I wonder if anyone else is also receiving email notifications from Gerrit for all events (commentary and new PS) | 08:14 |
cgoncalves | I have 276+ emails in my inbox for projects I didn't even know they exist | 08:15 |
*** andrewbonney has joined #opendev | 08:23 | |
*** tosky has joined #opendev | 08:38 | |
frickler | cgoncalves: can you check your settings at https://review.opendev.org/settings/#Notifications ? gerrit should only mail notifications it a project is listed there or if you are included as reviewer for a patch | 08:55 |
cgoncalves | frickler, notification settings look good and have not changed prior to gerrit update | 08:56 |
*** mgoddard has joined #opendev | 08:58 | |
frickler | cgoncalves: hmm, o.k., do you have a sample patch for which you received a mail but shouldn't have? | 08:58 |
cgoncalves | frickler, project cyborg: https://i.snipboard.io/guF4wS.jpg | 08:59 |
zigo | frickler: Yes, I do, I have the same watch file for all projects, and it works well. | 08:59 |
cgoncalves | I have never contributed/commented on that project | 08:59 |
zigo | frickler: Example for Nova: | 09:00 |
zigo | $ cat debian/watch | 09:00 |
zigo | version=3 | 09:00 |
zigo | opts="uversionmangle=s/\.0rc/~rc/;s/\.0b1/~b1/;s/\.0b2/~b2/;s/\.0b3/~b3/" \ | 09:00 |
zigo | https://github.com/openstack/nova/tags .*/(\d[brc\d\.]+)\.tar\.gz | 09:00 |
zigo | This doesn't take into account the "a" thing, though that's just adding one more char in the mix. | 09:00 |
zigo | frickler: I packaged git-review 1.28.0 to Testing/Unstable, should I backport it to buster-backports official ? | 09:01 |
zigo | frickler: What I don't know, is how to fetch a git tag list with opendev.org ... | 09:04 |
zigo | Can you help me with that ? | 09:04 |
frickler | zigo: not sure I understand that question, you want to see a tag list without cloning the repo? | 09:06 |
zigo | frickler: Yeah, so that I can point my watch files to it. | 09:07 |
*** fressi has joined #opendev | 09:08 | |
frickler | zigo: hmm, o.k. I see the github equivalent in your sample above. that will likely need a feature request for gitea. there is a request in the api but I doubt that that can be used in the watch file | 09:11 |
frickler | curl -X GET "https://opendev.org/api/v1/repos/opendev/git-review/tags" -H "accept: application/json" | 09:11 |
zigo | Ah, that helps, will try it, thanks. | 09:12 |
ttx | I noticed some extreme Gerrit dashboards failing with "Error 400 (Bad Request): limit of 10 queries" | 09:15 |
ttx | not a big deal, but thought i would report | 09:16 |
ttx | Example: https://tiny.cc/ReleaseInbox | 09:16 |
frickler | ttx: hmm, we'll have to see how we can fix that, I guess. could you please add the issue to https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes ? | 09:17 |
ttx | sure thing! There might be a tunable there | 09:18 |
*** sboyron has quit IRC | 09:18 | |
*** sboyron has joined #opendev | 09:19 | |
zigo | frickler: Any idea why this doesn't work? | 09:24 |
zigo | curl https://opendev.org/api/v1/repos/opendev/git-review/tags | grep --color -E '(\d[abrc\d\.]+)\.tar\.gz' | 09:24 |
*** hamalq has joined #opendev | 09:36 | |
frickler | zigo: seems grep uses [:digit:] instead of \d , or just use '([0-9][abcr0-9.]+)\.tar\.gz' | 09:36 |
frickler | at least that works for me with gnu grep 3.1 | 09:37 |
frickler | cgoncalves: fwiw I see the mails to you in exim log but I didn't find a gerrit log showing why these are being sent. maybe someone with more clue can have a look later | 09:39 |
*** hamalq has quit IRC | 09:40 | |
zigo | frickler: Works, but I still can't figure out how to write the debian/watch file ... :/ | 09:41 |
cgoncalves | frickler, thanks for looking. please let me know if I can provide any additional info | 09:44 |
*** dtantsur|afk is now known as dtantsur | 09:50 | |
zigo | frickler: I got it to work with mode=git ! :) | 09:51 |
zigo | frickler: $ cat debian/watch | 09:53 |
zigo | version=3 | 09:53 |
zigo | opts="mode=git ,uversionmangle=s/\.0rc/~rc/;s/\.0a/~a/;s/\.0b1/~b1/;s/\.0b2/~b2/;s/\.0b3/~b3/" \ | 09:53 |
zigo | https://opendev.org/opendev/git-review refs/tags/(\d[abrc\d\.]+) | 09:53 |
zigo | Still, my original question remains: should I upload git-review to official Debian backports? | 09:56 |
dtantsur | hey folks! great job with the gerrit update. notification emails about zuul comments no longer have clickable links, is it expected? | 10:04 |
frickler | zigo: I don't know much about backports, if you are unsure best wait for feedback from fungi or some other infra-root. | 10:19 |
zigo | frickler: Ok. Fungi will be of good advice, I'm sure. :) | 10:20 |
frickler | dtantsur: my emails were switched to html format, I needed to switch the config back to text-only, maybe it's the other way round for you? | 10:20 |
dtantsur | need to check, thanks for the hint | 10:20 |
frickler | or maybe links in text format work, because the email client auto-linkifies them, but they aren't links in html-format? | 10:22 |
dtantsur | "To view, visit change 762369. To unsubscribe, or for help writing mail filters, visit settings." <-- the links are correct here | 10:23 |
dtantsur | but not in the body. hmm. | 10:23 |
dtantsur | the body is an HTML without links, simply <li> tags | 10:23 |
frickler | dtantsur: so switching back to text-only might help. see https://review.opendev.org/settings/#Notifications Preferences/Email format | 10:24 |
zigo | Is there currently a way to simply wget a patch matching a review? I used to click on the gitweb link which is currently broken (and then later on, on the patch link). | 10:25 |
dtantsur | frickler: I'll try, but I used to use the HTML view and it worked correctly | 10:25 |
frickler | dtantsur: in that case you may want to add it to the etherpad as regression | 10:26 |
dtantsur | link? | 10:26 |
zigo | The only way I found was this: | 10:26 |
zigo | curl "https://review.opendev.org/changes/openstack%2Fnova~763750/revisions/1/patch?download" | base64 -d >1.patch | 10:26 |
zigo | A little bit annoying, but works. | 10:26 |
dtantsur | zigo: I usually do ^^^ | 10:26 |
*** hashar has joined #opendev | 10:27 | |
frickler | dtantsur: https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes | 10:29 |
dtantsur | added, thanks | 10:30 |
dtantsur | I wonder if it's the same as line 35 | 10:32 |
*** mordred has quit IRC | 10:34 | |
*** Eighth_Doctor has quit IRC | 10:34 | |
*** iurygregory has quit IRC | 10:39 | |
*** Eighth_Doctor has joined #opendev | 10:42 | |
frickler | dtantsur: iiuc that is only about the rendering in the UI, I don't think that that should be related to email formatting | 10:45 |
*** ralonsoh has quit IRC | 10:51 | |
*** ralonsoh has joined #opendev | 10:53 | |
*** mordred has joined #opendev | 10:59 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/c/zuul/zuul-jobs/+/740935 | 11:20 |
*** hamalq has joined #opendev | 11:36 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/c/zuul/zuul-jobs/+/740935 | 11:37 |
*** hamalq has quit IRC | 11:41 | |
*** ysandeep is now known as ysandeep|brb | 11:48 | |
*** iurygregory has joined #opendev | 11:58 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/c/zuul/zuul-jobs/+/740935 | 12:14 |
*** ysandeep|brb is now known as ysandeep | 12:19 | |
*** sboyron has quit IRC | 12:28 | |
*** DSpider has quit IRC | 12:32 | |
*** sboyron has joined #opendev | 12:37 | |
*** sboyron has quit IRC | 12:49 | |
*** sboyron has joined #opendev | 12:51 | |
*** sboyron has quit IRC | 12:56 | |
*** sboyron has joined #opendev | 13:01 | |
sean-k-mooney | is there a way to disable the draft comments section or move it in the dashboard | 13:04 |
sean-k-mooney | reading https://bugs.chromium.org/p/gerrit/issues/detail?id=9740 currently but no solution so far | 13:04 |
slaweq | frickler: tobiash and other infra cores, can You review patch https://review.opendev.org/c/zuul/zuul-jobs/+/762650 ? It's a workaround for the issue with communication between nodes in multinode ovn based jobs | 13:04 |
slaweq | it works as You can see in https://review.opendev.org/c/openstack/neutron/+/762654 in neutron-ovn-tempest-slow result which finally passed | 13:05 |
sean-k-mooney | when did we start usign ovs for that again instead of just creating vxlan tunnels in the kernel | 13:06 |
sean-k-mooney | i tought we moved away form ovs becaue it conflicted with some other jobs in the past | 13:10 |
sean-k-mooney | i know it caused issue with our third paty ci for ovs-dpdk at one point again because it compiled ovs from source | 13:11 |
sean-k-mooney | that said we didnt actully need the tunnel for that third party ci so it mostly worked fine without the tunnel | 13:12 |
*** sboyron_ has joined #opendev | 13:13 | |
*** sboyron has quit IRC | 13:15 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: WIP: Allow user to enable wip by default https://review.opendev.org/c/opendev/git-review/+/763780 | 13:19 |
*** sboyron_ has quit IRC | 13:21 | |
*** sboyron_ has joined #opendev | 13:21 | |
*** sboyron__ has joined #opendev | 13:23 | |
*** sboyron_ has quit IRC | 13:25 | |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: WIP: Allow user to enable wip by default https://review.opendev.org/c/opendev/git-review/+/763780 | 13:26 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: WIP: Allow user to enable wip by default https://review.opendev.org/c/opendev/git-review/+/763780 | 13:27 |
*** sboyron__ has quit IRC | 13:28 | |
*** sboyron__ has joined #opendev | 13:28 | |
*** sboyron__ has quit IRC | 13:29 | |
*** sboyron__ has joined #opendev | 13:33 | |
*** sboyron__ is now known as sboyron | 13:33 | |
*** hamalq has joined #opendev | 13:37 | |
*** hamalq has quit IRC | 13:42 | |
*** auristor has quit IRC | 14:00 | |
fungi | zigo: git-review 1.27 in buster should also be new enough to work with our gerrit, i think... is it not? | 14:00 |
zigo | fungi: Hi there! | 14:01 |
zigo | Well, I've just heard that for the new gerrit of this week-end, one must use git-review >= 1.28 ... | 14:01 |
fungi | i thought we said >=1.27 but maybe we got it wrong | 14:01 |
zigo | fungi: Oh, you're right, I missread what was written in this channel this morning. | 14:02 |
zigo | So I wont do a backport then. | 14:02 |
fungi | well, it can't hurt to confirm 1.27 is working okay... i'll try to test that shortly | 14:02 |
zigo | fungi: FYI, I just succeeded in using mode=git in watch files, it worked for git-review for me ! :) | 14:02 |
fungi | oh excellent | 14:02 |
zigo | The part that I was missing was the need to add "refs/tags/" before the version definition. | 14:03 |
zigo | So, something like this works: | 14:03 |
zigo | version=3 | 14:03 |
zigo | opts="mode=git, uversionmangle=s/\.0rc/~rc/;s/\.0a/~a/;s/\.0b1/~b1/;s/\.0b2/~b2/;s/\.0b3/~b3/" \ | 14:03 |
zigo | https://opendev.org/opendev/git-review refs/tags/(\d[abrc\d\.]+) | 14:03 |
zigo | I'm not sure how many package I have to update with that fix though I've lost track of that... :( | 14:04 |
*** auristor has joined #opendev | 14:11 | |
fungi | any reason you don't just do s/\.0b/~b/ like you so with .0a? | 14:17 |
fungi | also be aware those are python-specific (pep 440) versioning conventions, so you wouldn't probably apply them to non-python projects | 14:17 |
*** sboyron has quit IRC | 14:28 | |
*** sboyron has joined #opendev | 14:29 | |
*** rpittau is now known as rpittau|brb | 14:36 | |
tristanC | we are interested in porting the hideci.js script for the new gerrit ui, has this work started already? | 14:51 |
*** ykarel has quit IRC | 14:52 | |
clarkb | tristanC: not that I am aware of. One thought was to see about integrating it into the existong zuul plugin | 15:01 |
*** rpittau|brb is now known as rpittau | 15:01 | |
clarkb | also note that our workaround for x/foo repos may currently prevent polygerrit plugins from working | 15:01 |
clarkb | we're trying to work with upstream to figure out how to address this properly | 15:02 |
tristanC | clarkb: where is this work with upstream happening? | 15:02 |
clarkb | tristanC: for the zuul plugin or the /x/ conflict? | 15:03 |
tristanC | for the zuul plugin | 15:04 |
clarkb | tristanC: https://gerrit-review.googlesource.com/q/project:plugins/zuul we don't use the plugin (yet) but wikimedia does | 15:05 |
tristanC | software-factory is also upgrading to gerrit-3.x and we are looking for fixing the remaining issues. the zuul ci report table seems to be biggest one | 15:06 |
clarkb | once the upgrade itself settles a bit more we'd like to start looking at using more plugins like the zuul one (basically we aren't using it yet because it simplified upgrading and all that but now that we only have to build ~1 image and have ~1 gerrit to test its much easir to start looking at that stuff) | 15:07 |
tristanC | thank you for the pointer, so the plugins/zuul seems to be written in a mixed of java and polymer javascript template. I'm not sure we'll be able to contribute to that | 15:10 |
clarkb | tristanC: what is the issue with that? | 15:11 |
clarkb | but ya aiui all gerrit plugins are either going to be java or poylygerrit js or both | 15:11 |
tristanC | clarkb: well we'll have to learn the language. we were more looking at re-using the existing simpler hideci.js and adding it to the review page | 15:13 |
clarkb | tristanC: you've got us all looking at different languages over in #zuul :P | 15:13 |
tristanC | clarkb: which where all functional... well if you think that using the plugins/zuul is the way to go, we can have a look, but that seems like a lot more work than adapting the hideci script | 15:16 |
clarkb | tristanC: yes, we think it is better beacuse then we can adhere to proper apis and not be broken every time gerrit updates | 15:16 |
*** DSpider has joined #opendev | 15:16 | |
clarkb | hideci was always a hack, there is a non hacky way to do it now and we think thati s preferable | 15:18 |
clarkb | tristanC: re functioanl, yes but zuul and its associated code are not written functionally or in functional languages. I guess my point was more that its ok to learn new languages if they provide a benefit | 15:18 |
clarkb | in your case it provided benefits to zuul's k8s oeprator config file management. In this case it provides a benefit to your use of gerrit | 15:18 |
clarkb | TheJulia: I know you're running the ironic meeting right now, but when that is done I would be curious to hear more about the graphical issues you have had. I have yet to encounter anything like that in testing or post upgrade | 15:20 |
clarkb | I wonder if it could be related to browser hardware acceleration | 15:20 |
clarkb | fungi: ianw fungi it seems that gerritbot is still running. SO maybe something with how the container is setup? I haven't looked at ianw's new notes on that yet though and maybe that has more infos I should catch up on | 15:22 |
fungi | clarkb: i think the watcher thread dies and then gerritbot stops getting new events, but the gerritbot process itself lives on doing nothing | 15:25 |
clarkb | fungi: ya I mean whatever ianw has done has kept it reporting | 15:25 |
fungi | oh, i see. cool! | 15:25 |
clarkb | at least as recently as 15:08UTC in the ironic channel | 15:25 |
fungi | i have yet to revisit that problem | 15:25 |
openstackgerrit | Merged openstack/diskimage-builder master: Add support for vlan interfaces in dhcp-all-interfaces.sh https://review.opendev.org/c/openstack/diskimage-builder/+/761177 | 15:27 |
clarkb | as recently as 15:27 now :) | 15:27 |
fungi | heh | 15:27 |
*** elod has quit IRC | 15:27 | |
*** lpetrut has quit IRC | 15:28 | |
*** elod has joined #opendev | 15:28 | |
tristanC | clarkb: i worry that learning all that gerrit plugin development stack is going to take a lot more time than simply porting the previous hack. Perhaps we could start by restoring the missing feature using the hideci script, and then replace it by the plugin when it's ready/installed? | 15:30 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Run tag-releases on ubuntu-focal https://review.opendev.org/c/openstack/project-config/+/763797 | 15:30 |
fungi | tristanC: how much sunk cost is that every time some new change to polygerrit causes hideci.js to break and needs forward-porting? | 15:31 |
tristanC | fungi: i think in a day or two we can get it working | 15:31 |
fungi | each time it breaks? | 15:31 |
tristanC | perhaps it can be improved to break less often too | 15:32 |
tristanC | looking at gerrit dev-plugin documentation, it seems like it support `Web UI plugins distributed as a single .js file` | 15:33 |
clarkb | tristanC: yes I think those are the "polygerrit" plugins | 15:33 |
clarkb | those are the type we expect won't work in our current deployment (though we intend on fixing that) | 15:34 |
fungi | specifically because of our workaround to be able to continue to clone x/.* repositories from gerrit | 15:34 |
fungi | https://bugs.chromium.org/p/gerrit/issues/detail?id=13721 | 15:34 |
clarkb | fungi: after meetings I was going to try and summarize the ml thread thoughts on ^ and do my best to get people to shift conversation there | 15:35 |
*** fbo has joined #opendev | 15:37 | |
*** hamalq has joined #opendev | 15:38 | |
tristanC | fungi: it's also that the zuul plugin seems more complex than just display the ci result in a table, e.g. https://gerrit-review.googlesource.com/c/plugins/zuul/+/275024/1 | 15:39 |
tristanC | fungi: so i agree hideci sounds like a sunk cost, but i think it's worth a try to at least restore the feature with 3.2 | 15:40 |
openstackgerrit | Merged openstack/diskimage-builder master: simple-init: also remove en* interfaces from the images https://review.opendev.org/c/openstack/diskimage-builder/+/763660 | 15:42 |
*** fressi has quit IRC | 15:42 | |
*** hamalq has quit IRC | 15:42 | |
*** fressi has joined #opendev | 15:43 | |
*** chandankumar is now known as raukadah | 15:49 | |
clarkb | I've annotated thoughts on the post upgrade notes etherpad to tryand call out what I suspect are sources of problems or where upstream bugs should be filed | 15:52 |
*** ysandeep is now known as ysandeep|brb | 15:52 | |
*** elod has quit IRC | 15:54 | |
*** elod has joined #opendev | 15:54 | |
fungi | thanks! | 15:57 |
*** mlavalle has joined #opendev | 16:03 | |
openstackgerrit | Merged openstack/project-config master: Run tag-releases on ubuntu-focal https://review.opendev.org/c/openstack/project-config/+/763797 | 16:08 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Make py36 minimum version required https://review.opendev.org/c/opendev/git-review/+/763803 | 16:11 |
clarkb | zigo: ^ that will break xenial users. I don't think we should do that | 16:12 |
*** mgoddard has quit IRC | 16:12 | |
clarkb | sorry zigo that was for zbr ^ | 16:12 |
*** xavpaice has quit IRC | 16:12 | |
*** mgoddard has joined #opendev | 16:15 | |
zbr | clarkb: do we have a support contract that binds us to support it? i am sure we cannot run out of distros using unsupported python versions. | 16:16 |
clarkb | zbr: we use xenial still | 16:16 |
clarkb | we maintain the software | 16:16 |
clarkb | therefore I'm -2 on that | 16:16 |
zbr | clarkb: dropping it does not render existing version unusable | 16:17 |
clarkb | zbr: no but if say we upgrade and things chagne again it could | 16:17 |
clarkb | and really keeping 3.5 support isn't that big of a burden | 16:17 |
zbr | py35 cannot have inline type hints | 16:17 |
clarkb | ya I'm fine with that | 16:17 |
zbr | for py35 user must create pyi files which are a pita | 16:18 |
clarkb | the benefit to making those changes in a simple tool like this doesn't ouweigh supporting existing users | 16:18 |
clarkb | whcih includes ourselves | 16:18 |
zbr | clarkb: ok.... not pleased but if we are using it.... | 16:19 |
clarkb | zbr: also I think seeing the trouble 1.26 has caused on even newer distros poinst to us needing to be cautious with that utility more generally | 16:19 |
clarkb | its small, we can manage to keep old python support for a little while longer | 16:19 |
zbr | it makes me bit curious because git-review is a developer tool, which comes to the question which developer run xenial on his machine in 2020. | 16:20 |
fungi | it also gets run in automation | 16:20 |
zbr | clarkb: have you read the news? ansible 3.0 (previous known as 2.11) will require python 3.8 on controller. | 16:22 |
zbr | https://www.reddit.com/r/ansible/comments/jwzwwf/ansible300_schedule_and_preview_of_400_schedule/ | 16:22 |
clarkb | ok? | 16:22 |
clarkb | I hadn't but I guess we just won't upgrade for a while then | 16:22 |
zbr | that is what i said too, it will make even worse to upgrade ansible. | 16:23 |
zbr | maybe for git-review is not a big deal as it have very few deps but for other projects is a really PITA to support py35 because almost all library vendors already dropped support for it. | 16:25 |
zbr | if something goes bad in requests, i doubt they will want to make a new release of it. | 16:26 |
clarkb | well pypi also does version specific installs now | 16:27 |
clarkb | er pypi + pip | 16:27 |
clarkb | so it shouldn't be a major problem as long as those deps have annotated themselves properly | 16:27 |
fungi | except insofar as most of those dependencies probably aren't backporting security fixes to patch releases and just roll forward, leaving tools which need older python on older vulnerable versions of those deps | 16:28 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Make py35 minimum version required https://review.opendev.org/c/opendev/git-review/+/763803 | 16:29 |
zbr | i think we should update the bot to skip notifications for WIP changes, what do you think? | 16:30 |
*** ysandeep|brb is now known as ysandeep | 16:30 | |
clarkb | zbr: we need to fix the bot's ssh streaming first | 16:31 |
fungi | that seems like it could be a good improvement, though i don't think it's urgent since we didn't have a wip feature before now and not many people are likely to start using it right away | 16:31 |
fungi | and yeah, bug fixes/regressions related to the upgrade take priority | 16:32 |
clarkb | whatever ianw has done seems to make it happy | 16:32 |
clarkb | so we probably just need to work backward from there | 16:32 |
fungi | yeah, i'm going to not touch it for now as long as it continues running, and sync up with him once he's awake | 16:32 |
fungi | we have plenty of other fish to fry in the meantime | 16:32 |
zbr | i also observed a lot of verbosity when uploading changes "Processing changes: updated: " --- progress like ones. I wonder if we can disable them, or to filter inside git-review. | 16:36 |
*** lpetrut has joined #opendev | 16:38 | |
clarkb | as a heads up TheJulia seems to have worked around the new ui graphical issues by disabling hardware acceleration in the browser | 16:40 |
clarkb | if anyone sees that this appears to be the workaround | 16:40 |
fungi | was it chromium? apparently chromium 86 also crashes if you try to use webrtc, cause suspected to be related to hardware accel | 16:42 |
clarkb | chrome | 16:42 |
fungi | supposedly 87 fixes it | 16:42 |
clarkb | though not sure if that is just short for chromium in this context | 16:42 |
fungi | ahh, maybe not the same problem then | 16:42 |
zbr | fungi: maybe you can help me with clarification about how to add a generic tox-py39 job. https://review.opendev.org/c/zuul/zuul-jobs/+/762192 | 16:51 |
zbr | i was asked to remove the ubuntu-focal nodeset, which obviously produced a failure to find py39 as it is not available on bionic (current default nodeset) | 16:52 |
fungi | zbr: in a meeting right now but i can look for an example after i get the release jobs fix tested | 16:54 |
zbr | sure, no pressure. thanks | 16:54 |
cgoncalves | fungi, clarkb: earlier today I reported here on the channel an issue I'm having post-gerrit upgrade. Gerrit is sending me email notifications for all sorts of events (e.g. new PS, comments, votes) even for projects I didn't know they exist. email count up +580 since Gerrit was upgraded | 16:55 |
clarkb | cgoncalves: ya I updated the etehrpad with notes on that one | 16:56 |
clarkb | there is a flag that I suspect is related that we can turn off in the gerrit config | 16:56 |
clarkb | cgoncalves: are you a reviewer on those changes? I think that would help confirm it | 16:56 |
clarkb | the ui should show you who it thinks all the reviewers are (people may have added you mistakenly or something) | 16:57 |
cgoncalves | clarkb, I am not a reviewer | 16:58 |
cgoncalves | for example, I received two email notification for https://review.opendev.org/c/openstack/python-tripleoclient/+/757836 | 16:58 |
cgoncalves | I don't follow that project or am a reviewer | 16:58 |
clarkb | ok I just spot checked my gerrit folder and don't see similar. | 16:59 |
fungi | yeah, it's not doing that to me | 17:00 |
clarkb | I'm thinking we check your user's project watches and external ids directly | 17:00 |
clarkb | cgoncalves: if you go to https://review.opendev.org/settings/ what does it say your ID is? | 17:00 |
cgoncalves | clarkb, 6469 | 17:00 |
cgoncalves | clarkb, also https://i.snipboard.io/SotlJ3.jpg | 17:01 |
*** hamalq has joined #opendev | 17:01 | |
clarkb | its because you're in all projects and all users I bet | 17:01 |
clarkb | cgoncalves: any idea why you're in those? | 17:01 |
clarkb | cgoncalves: can you remove the subscription you have for those two repos and see if the problem goes away? | 17:02 |
cgoncalves | clarkb, that came as default. also note that it's only applicable when I am either the owner or reviewer | 17:02 |
*** rpittau is now known as rpittau|afk | 17:03 | |
clarkb | default adding back in the 2.x version days or did you add ubscriptions on 3.2 and it added those? | 17:03 |
clarkb | anyway I highly suspect it is those because projects inherit from them | 17:03 |
clarkb | I would start by removing those watches and see if it chagnes anything | 17:03 |
cgoncalves | clarkb, default since I have my account, years ago. I haven't touch notification settings in a long time | 17:03 |
clarkb | thanks, if that is the problem its a good thing for us to know as we may need to manually clear those out for people if this fixes it | 17:03 |
fungi | unrelated, but looking at the release job failure i'm also wondering if we want to expand the list of gerrit host keys we add to known_hosts? https://zuul.opendev.org/t/openstack/build/fe46b5286a8145a89e06df95eee2ecf2/console#1/0/26/ubuntu-bionic | 17:04 |
fungi | still trying to work out where that's getting passed in | 17:04 |
clarkb | fungi: I think it may be part of the secret (but a plain text attribute) | 17:05 |
fungi | yeah, i'm just not finding it in codesearch, but that helps me narrow down where to look at least | 17:05 |
openstackgerrit | Sorin Sbârnea proposed openstack/project-config master: Update git-review test matrix https://review.opendev.org/c/openstack/project-config/+/763808 | 17:08 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Make py35 minimum version required https://review.opendev.org/c/opendev/git-review/+/763803 | 17:10 |
*** sboyron_ has joined #opendev | 17:13 | |
openstackgerrit | Sorin Sbârnea proposed openstack/project-config master: Update git-review test matrix https://review.opendev.org/c/openstack/project-config/+/763808 | 17:14 |
*** sboyron has quit IRC | 17:16 | |
clarkb | cgoncalves: ok I see you've removed those watches, pelase let us know if the behavior changes | 17:19 |
openstackgerrit | Sorin Sbârnea proposed opendev/git-review master: Allow choosing which field to use as author when naming branch https://review.opendev.org/c/opendev/git-review/+/444574 | 17:20 |
cgoncalves | clarkb, no new emails since 18 minutes ago. I was getting at least one every ~2 minutes | 17:23 |
clarkb | cgoncalves: cool. I think we'll work out how to audit others in that situation and then see if we can fix it for them | 17:23 |
cgoncalves | clarkb, thank you | 17:24 |
*** sboyron__ has joined #opendev | 17:25 | |
*** sboyron_ has quit IRC | 17:29 | |
fungi | aha, https://opendev.org/openstack/project-config/src/branch/master/zuul.d/secrets.yaml#L525-L526 | 17:37 |
fungi | there's a similar entry for proposal_ssh_key in there too | 17:37 |
fungi | i'll double-check whether we should add entries to that | 17:38 |
fungi | clarkb: after you did something similar in nodepool for the fips fix, what's your take on adding multiple keys? | 17:38 |
fungi | just use ssh-keygen to get all the keys being served by the api and stick them all in the secret? | 17:39 |
clarkb | fungi: ya gerrit publishes them iirc | 17:41 |
clarkb | should be reasonable to add them in | 17:41 |
*** hamalq has quit IRC | 17:43 | |
*** hamalq has joined #opendev | 17:43 | |
openstackgerrit | Sorin Sbârnea proposed zuul/zuul-jobs master: Build sphinx with python3 instead https://review.opendev.org/c/zuul/zuul-jobs/+/735923 | 17:46 |
*** mgoddard has quit IRC | 17:54 | |
fungi | i guess gerrit no longer lists its ssh host keys in the account settings view? | 17:57 |
*** weshay|ruck is now known as weshay|interview | 17:57 | |
clarkb | huh seems to be the case | 17:58 |
*** ysandeep is now known as ysandeep|away | 17:59 | |
fungi | how did you go about retrieving all the host keys with nodepool? | 18:00 |
tristanC | clarkb: fungi: digging the gerrit zuul plugin, it doesn't seems to implement the build result table. And looking at https://gerrit-review.googlesource.com/Documentation/js-api.html it seems relatively simple to render the build result in a table under the commit message | 18:02 |
clarkb | fungi: its super hacky paramiko I wouldn't replicate it | 18:02 |
clarkb | fungi: it starts an ssh connection and does handshaking but client only advertises one valid hostkey type. Then if you handshake successfully you record that hostkey | 18:02 |
clarkb | fungi: I think ssh-keycsan can do it? | 18:02 |
fungi | oh, yep, i was looking at the keygen manpage. d'oh! | 18:03 |
tristanC | i actually started to test the api, and the "showChange" callback provides an object with all the info we need, which seems more stable than the hideci implementation which goes through the dom objects | 18:03 |
zbr | any chance to +W the POLLIN fix for non-linux? https://review.opendev.org/c/opendev/gerritlib/+/729966 | 18:04 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Add known_hosts entries for additional Gerrit keys https://review.opendev.org/c/openstack/project-config/+/763830 | 18:12 |
fungi | i need to step away for a few minutes to do stuff i should have done 5 hours ago when i woke up, back shortly | 18:14 |
*** eolivare has quit IRC | 18:22 | |
sean-k-mooney | by the way i have not checked my third party ci yet but do we need to do anyting after the gerrit change | 18:25 |
sean-k-mooney | e.g. restart zuul or similar | 18:25 |
clarkb | sean-k-mooney: your zuul gerrit config needs to use basic http auth instead of digest auth if you have an http gerrit conenction set up | 18:26 |
clarkb | basic authi is the default if you just remove the digest auth option | 18:26 |
sean-k-mooney | i think im using an ssh key | 18:27 |
clarkb | if using ssh I don't think there is anything to do | 18:28 |
clarkb | the only change we made was for the http auth option | 18:28 |
sean-k-mooney | http://paste.openstack.org/show/800329/ that my config more or less and ya it look like i have abuild around 4 | 18:29 |
sean-k-mooney | so i guess its working | 18:29 |
sean-k-mooney | ok ill let ye know if i notice anything odd | 18:31 |
sean-k-mooney | ya the only thing i see in teh logs are Errno None] Unable to connect to port 29418 on 104.130.246.32 or 2001:4800:7819:103:be76:4eff:fe04:9229 | 18:36 |
sean-k-mooney | but that is proably just when ye were doing the reboots | 18:36 |
sean-k-mooney | well restarts | 18:37 |
clarkb | yes we restarted a number of times over the weekend to edit configs | 18:37 |
sean-k-mooney | looks like it was pretty transparent other then that | 18:37 |
sean-k-mooney | hum ok so i might not want to keep 30 days of debug logs though | 18:39 |
sean-k-mooney | 2.7G /var/log/zuul/ | 18:39 |
clarkb | zuul is chatty | 18:39 |
sean-k-mooney | espacialy when it dumps the gerrit events into the logs at debug level | 18:39 |
sean-k-mooney | it has hit the 30 day limit i guess but the folder its logging to is not on the cinder volume i set up with space for this stuff. i just forgot to turn off debug | 18:42 |
clarkb | we run with debug on beacuse we tend to be zuul beta testers :) | 18:43 |
sean-k-mooney | debug might make sense for the zuul logger but i dont need it for teh gerrit one right | 18:44 |
sean-k-mooney | this is what i was going to change it too http://paste.openstack.org/show/800333/ | 18:46 |
clarkb | unless you end up debugging why a specific gerrit event isn't causing jobs to trigger | 18:46 |
sean-k-mooney | although i mght move teh directly to the cinder volume too | 18:46 |
clarkb | also one thing we do is logs debug + everything else to a different file | 18:46 |
clarkb | which allows us to rotate it faster if we want to | 18:46 |
clarkb | you could set up debug + everything else to be a daily log and everything else to be 30 days | 18:46 |
sean-k-mooney | yep i think i copied my logging config form the upstream one | 18:47 |
sean-k-mooney | i gess i could just drop it to 5 days | 18:47 |
sean-k-mooney | honestly if i move it to the cidner volume i really dont mind having the 30days there | 18:47 |
sean-k-mooney | its just my root partion is not that large and dont want it to file up but the cinervolume i can always make bigger if i need too | 18:48 |
*** sboyron__ has quit IRC | 18:49 | |
*** sboyron__ has joined #opendev | 18:49 | |
*** dtantsur is now known as dtantsur|afk | 18:49 | |
sean-k-mooney | clarkb: by the way just tried generating a new http password for gertty to use and im getting a 500 internal error for gerrit | 18:58 |
sean-k-mooney | Endpoint: /accounts/self/password.http | 18:59 |
fungi | sean-k-mooney: when doing it in the preferences view in the webui? | 18:59 |
sean-k-mooney | from https://review.opendev.org/settings/#HTTPCredentials | 19:00 |
sean-k-mooney | in the ui yes | 19:00 |
sean-k-mooney | i was just trying to see if gertty worked with new gerrit | 19:00 |
sean-k-mooney | i dont use it often but was just wondering but first hurdel was my config was from before the review.openstack.org to review.opendev.org rename and my http password was out of date too | 19:01 |
sean-k-mooney | its been like 2 years since i tried to use it | 19:01 |
clarkb | sean-k-mooney: FileLock invalidated by an external force | 19:02 |
sean-k-mooney | such a discriptive error message | 19:02 |
clarkb | I wonder if lots of people updating accounts is thrashing the locks around the updates | 19:02 |
clarkb | sean-k-mooney: can you try again now and see if it happens again? | 19:03 |
sean-k-mooney | maybe did ye reset people passwords when we had the privlage escalation issue | 19:03 |
clarkb | we deleted them (but that was on the sql db) | 19:03 |
sean-k-mooney | ya i assume my old one did not work because of something related to that | 19:04 |
sean-k-mooney | althoug i did not test it after i fixed the url | 19:04 |
sean-k-mooney | ill try it again later in the week | 19:04 |
sean-k-mooney | can gertty use ssh keys by the way? | 19:05 |
clarkb | I don't think so | 19:05 |
sean-k-mooney | ya looking quickly i dont see anything that suggest it does | 19:06 |
sean-k-mooney | which makes sense its connecting to the http api | 19:06 |
sean-k-mooney | with the review now stored in notedb gertty could use that too right | 19:08 |
sean-k-mooney | i mean once its updated | 19:08 |
*** sboyron__ has quit IRC | 19:10 | |
*** sboyron__ has joined #opendev | 19:10 | |
clarkb | infra-root re ^ looking in the error log it is complaining about NativeFSLocks on /var/gerrit/index/accounts_0011/write.lock | 19:15 |
clarkb | I don't see any locks on that file using lslocks from the host side, do we need to docker exec lslocks? | 19:16 |
clarkb | another thought is maybe we trigger an online reindex on accounts? | 19:17 |
fungi | maybe... i'm getting flashbacks of the 2.11(?) upgrade attempt where we had query timeouts causing changes to go missing | 19:17 |
fungi | are there a bunch of those errors? | 19:18 |
*** toma4 has quit IRC | 19:18 | |
*** sboyron__ has quit IRC | 19:19 | |
clarkb | yes, and I was able to induce it by changing my tab width in my preferences | 19:19 |
*** sboyron__ has joined #opendev | 19:19 | |
clarkb | running lslock -u in the container shows gerrit has locks on other index items but not the accounts one | 19:19 |
clarkb | I also half wonder if this is a stale lockfile because of a restart we did | 19:19 |
clarkb | and gerrit isn't shutting down gracefully and the lock file remains in place and gerrit won't remoe it? | 19:20 |
clarkb | hrm unlikely the files in that dir were last updated ~17:04 UTC today | 19:22 |
clarkb | and we last restarted around 23:00UTC yesterday | 19:22 |
clarkb | either something has the lock for valid reasons for a couple hours or something is sad? | 19:22 |
*** toma4 has joined #opendev | 19:25 | |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: Use Python 3.x with launchpadlib https://review.opendev.org/c/zuul/zuul-jobs/+/763834 | 19:25 |
*** sboyron__ has quit IRC | 19:25 | |
*** sboyron__ has joined #opendev | 19:25 | |
*** mgoddard has joined #opendev | 19:26 | |
clarkb | it does seem to be persistent. I think we should consider restarting gerrit to see if it can get the lock back? | 19:27 |
clarkb | and maybe trigger an online reindex? | 19:27 |
fungi | you're hopign a reindex will clear file locks? | 19:28 |
clarkb | actually hold on | 19:28 |
clarkb | Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed <- the tracebacks are long but I think this is the real issue | 19:28 |
clarkb | oh wait no I'm just confused by the tracebacks being 500 lines | 19:29 |
clarkb | that error is caused by the lockfile error | 19:29 |
clarkb | looking at logs it started at 17:39 ish | 19:31 |
fungi | i wonder what could have happened around that time | 19:31 |
fungi | 17:39 ish today? | 19:31 |
clarkb | yes | 19:32 |
clarkb | zbr: updated preferences at that time is the first occurnece I find | 19:32 |
clarkb | https://bugs.chromium.org/p/gerrit/issues/detail?id=4400 | 19:32 |
clarkb | that is a fixed bug from a long time ago but similar issue | 19:32 |
clarkb | I think what it is saying is that it doesn't have the lock on that file anymore so it cannot write to the index | 19:35 |
clarkb | but lslocks doesn't seem to show anything has the lock | 19:35 |
clarkb | whcih is why I suspect if we restart it will startup and grab the lock and be happy again (of course I could be wrong about that) | 19:36 |
*** lpetrut has quit IRC | 19:36 | |
fungi | creationTime=2020-11-21T15:45:53.889402Z | 19:36 |
*** mgoddard has quit IRC | 19:38 | |
fungi | most recent occurrence seems to have been 19:20:05 | 19:38 |
clarkb | the creation time for the same file on review-test is 2020-11-09 | 19:38 |
clarkb | I don't think they are relying on file presence as much as linux fs locks | 19:39 |
fungi | could it be caused by git gc/repack? | 19:41 |
fungi | though looks like we repack daily at 04:17 utc | 19:42 |
clarkb | I doubt it since this is all happening on the lucene side in review_site/index not review_site/git | 19:42 |
fungi | ohhhh | 19:42 |
fungi | /var/gerrit/index right | 19:42 |
clarkb | interstingly its a warning for some operations on accounts and an error for others | 19:42 |
clarkb | sean-k-mooney: found the error version | 19:42 |
fungi | calling through com.google.gerrit.server.index.change.ReindexAfterRefUpdate.onGitReferenceUpdated | 19:43 |
*** andrewbonney has quit IRC | 19:44 | |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: Switch to container-images for push-to-intermediate-registry https://review.opendev.org/c/zuul/zuul-jobs/+/763836 | 19:44 |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: Switch to container_images for push-to-intermediate-registry https://review.opendev.org/c/zuul/zuul-jobs/+/763836 | 19:44 |
clarkb | at 17:42 track upstream runs | 19:47 |
clarkb | but thats after we notice the problem | 19:47 |
fungi | some other interesting errors in the log too | 19:48 |
fungi | "Account [...] has invalid filter in project watch ProjectWatchKey" | 19:49 |
fungi | lots of "Cannot merge" errors, maybe those refs are broken | 19:50 |
fungi | coming from "Error checking mergeability of ..." various refs for different projects | 19:50 |
clarkb | ya thats openstack/openstack and its sad | 19:51 |
clarkb | but we knew that from before iirc | 19:51 |
fungi | oh, these are all openstack/openstack okat | 19:51 |
fungi | also "Cannot check change kind of new patch set..." related to openstack/openstack | 19:51 |
clarkb | ya so my two thoughts on this are: service restart may allow it to grab a new lock. Reindex may forcefully take the lock (probably only as offline though) | 19:52 |
clarkb | on review-test lslocks shows accounts_0011 is locked by gerrit | 19:53 |
fungi | that's suspicious | 19:53 |
fungi | so yeah maybe the lock is from ages ago? | 19:54 |
clarkb | no sorry I don't mean it to be suspicous | 19:54 |
clarkb | I'm saying review-test is happy and is currently holding a valid lock | 19:54 |
fungi | oh | 19:54 |
clarkb | review is unhappy and lslocks does not have a lock | 19:54 |
clarkb | but I can't find anything that would indicate why review lost its lock | 19:54 |
clarkb | other than maybe track-upstream or manage-projects? | 19:54 |
clarkb | because those bind mount the same dirs into different contexts | 19:55 |
clarkb | maybe docker/containers/linux doesn't like that? | 19:55 |
fungi | could the lock be outside the container? | 19:55 |
fungi | er, for a process outside the container | 19:56 |
clarkb | if I run lslocks outside the container I don't see it either | 19:56 |
fungi | :/ | 19:56 |
fungi | but definitely locks for other files under /var/gerrit/index | 19:56 |
clarkb | ya | 19:57 |
fungi | status notice The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an unexpected error condition, downtime should be less than 5 minutes | 19:58 |
fungi | that ^ work? | 19:58 |
clarkb | wfm | 19:58 |
fungi | #status notice The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an unexpected error condition, downtime should be less than 5 minutes | 19:59 |
openstackstatus | fungi: sending notice | 19:59 |
fungi | you want to down and up -d the container or shall i? | 19:59 |
-openstackstatus- NOTICE: The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an unexpected error condition, downtime should be less than 5 minutes | 19:59 | |
clarkb | can you do it? | 19:59 |
fungi | yep | 19:59 |
clarkb | sudo lslocks shows it has the lock now | 20:00 |
fungi | i agree | 20:00 |
clarkb | I'll update my tab width again | 20:00 |
*** hamalq has quit IRC | 20:01 | |
fungi | java 115703 POSIX 0B WRITE 0 0 0 /var/gerrit/index/accounts_0011/write.lock | 20:01 |
clarkb | fungi: do you want to work on filing a bug or should I/ | 20:01 |
fungi | you seem to have more context, but also we don't have a ton of detail yet to include right? | 20:01 |
clarkb | ya we don't have a tno of detail but I think its better to start this discussion early with them | 20:02 |
openstackstatus | fungi: finished sending notice | 20:02 |
fungi | i can draft something in an etherpad first | 20:02 |
fungi | just to make sure i've got all the context | 20:02 |
*** openstackgerrit has quit IRC | 20:02 | |
clarkb | ++ | 20:03 |
clarkb | I was able to change my tab width setting without issue | 20:03 |
clarkb | and there is no new traceback with my username in the error log | 20:03 |
clarkb | talking out loud here not knowing anything about lucene, it kinda feels like ti should try to reacquire the lock after it has lost it | 20:04 |
clarkb | since the system said no one had the lock | 20:04 |
clarkb | that should've been successful | 20:04 |
*** hamalq has joined #opendev | 20:13 | |
*** sboyron__ has quit IRC | 20:13 | |
*** sboyron__ has joined #opendev | 20:13 | |
*** jhesketh has quit IRC | 20:16 | |
*** hamalq has quit IRC | 20:19 | |
*** jhesketh has joined #opendev | 20:23 | |
sean-k-mooney | clarkb: well it seams to work now after teh restart | 20:29 |
ianw | o/ | 20:29 |
clarkb | sean-k-mooney: ya we confirmed the gerrit process is holding the file lock and I tested too | 20:31 |
ianw | looks like gerritbot hit the problem @ 2020-11-23 19:59:49,916 DEBUG paramiko.transport: EOF in transport thread | 20:31 |
clarkb | ianw: we restarted gerrit | 20:31 |
clarkb | ianw: but it was doign fine otherwise which leads me to think whatever you did fixed it | 20:31 |
fungi | yeah, though i guess it still needs restarting after gerrit restarts | 20:31 |
ianw | i didn't do anything but run it under gdb :) | 20:31 |
fungi | the timing definitely corresponds with the gerrit restart | 20:32 |
ianw | yeah, it hit the exit breakpoint, but unfortunately there's no python backtrace at that point | 20:32 |
clarkb | ianw: did you install it somewhere else? | 20:32 |
clarkb | or maybe pull an ew imge or something? | 20:32 |
ianw | well it is running under the debian python 3.7, in the image, not the -slim python that installs in /usr/local/bin/ | 20:33 |
clarkb | ianw: I wonder if that is it | 20:35 |
clarkb | because it was rock solid | 20:35 |
clarkb | does zuul use the same python that gerritbot uses? | 20:35 |
ianw | yeah, i mean the idea is to use /usr/local/bin/python in these containers, the debian python isn't even installed by default | 20:37 |
ianw | i just did that to get a python with symbols | 20:37 |
clarkb | ya I mean are they both using 3.7 or 3.8 ro whatever? | 20:37 |
clarkb | as those will be different builds and there could be an issue with whichever one gerritbot has if zuul is different | 20:37 |
ianw | oh, hrm, i think zuul is a 3.8 container now? | 20:37 |
clarkb | ya just confirmed | 20:41 |
clarkb | zuul is 3.8 and gerritbot is 3.7 but otherwise they use the same set of opendev jobs | 20:41 |
ianw | the thing is that it looks to me like it should be trying to re-establish connections when they drop | 20:42 |
ianw | however, instead it goes into a death loop | 20:42 |
clarkb | s/jobs/images/ | 20:42 |
ianw | so we now know that catching the ssh thread @ pthread_exit isn't helpful in seeing where the exception came from | 20:43 |
clarkb | because it isn't exiting but looping? | 20:43 |
ianw | when it ends up in it's death loop, the ssh thread has exited, and then it seems to be the other bits around it that are constantly trying to read from fd's that will never return | 20:44 |
clarkb | got it | 20:44 |
ianw | unfortunately though by pthread_exit, python seems to have destroyed all it's frames | 20:44 |
ianw | i've restarted it under pdb | 20:50 |
ianw | i've never really used that before | 20:50 |
ianw | we can kill ssh connections via the cli right? perhaps i can try some manual testing simulating disconnects | 20:50 |
clarkb | ya you can show connections then use the connection id to kill them iirc | 20:52 |
fungi | on a related note, i saw that gerrit now claims to immediately disconnect established ssh sockets and invalidate http sessions for any users we disable | 20:54 |
fungi | (it was mentioned in release notes for some recent version) | 20:54 |
ianw | seems sane | 20:54 |
ianw | there's a couple of things in the dib queue i'd like to try and get in (last night there was tripleo issues causing gate job failures) and do a release, and bump in nodepool so we can build centos7 again | 20:55 |
ianw | otherwise, i think the builders are back to being sane | 20:55 |
*** hamalq has joined #opendev | 21:01 | |
fungi | clarkb: https://bugs.chromium.org/p/gerrit/issues/detail?id=13726 | 21:02 |
fungi | i've added it to the pad | 21:04 |
clarkb | lgtm thanks | 21:04 |
clarkb | if anyone knows how to subscribe to bugs in their bug tracker please let me know | 21:04 |
clarkb | maybe if I leave comments then it will do what I want | 21:04 |
fungi | clarkb: apparently if you "vote for" the bug it subscribes you? | 21:04 |
fungi | when i look at the bugs i've opened it says "You have voted for this issue and will receive notifications." | 21:05 |
clarkb | I don't see anything in the ui to vote for the bug though | 21:05 |
clarkb | If you make a comment there is a check mark that says send email | 21:06 |
clarkb | but I don't really want ot make a reandom comment just to get cc'd | 21:06 |
clarkb | oh wait I see the thing | 21:06 |
clarkb | there is a tooltip because this si so confusing | 21:06 |
clarkb | you have to star the bug | 21:06 |
fungi | must have been designed by the same folks who designed the gerrit ui? ;) | 21:07 |
fungi | "starring" seems consistent with how you subscribe to things in gerrit, after all | 21:07 |
fungi | clarkb: seeing a bunch of this in dmesg... maybe related? "aufs au_opts_verify:1597:dockerd[60130]: dirperm1 breaks the protection by the permission bits on the lower branch" | 21:12 |
clarkb | fungi: ya I saw that too but it seems to have been happening for a while | 21:13 |
clarkb | and 2.13 didn't care | 21:13 |
fungi | oh, yep, goes back quite a ways | 21:13 |
clarkb | fungi: I think that is caused by track-upstream | 21:13 |
clarkb | (we shoudl fix it if we can figure it out) | 21:13 |
fungi | anything else you can think of i should add on that new bug report? | 21:13 |
clarkb | fungi: maybe a wondering if lucene/gerrit should try to reacquire the lock since lslocks showed it wasn't held by anything? | 21:17 |
fungi | i can add a comment, sure | 21:18 |
fungi | though that's basically what i meant by "unsure what transpired to kill the lock and prevent it from being reacquired" | 21:18 |
fungi | okay, now i *really* need to step away for a bit. intended to get a shower when i woke up this morning, haven't had time to do that yet, and now i need to start cooking dinner | 21:18 |
clarkb | fungi: oh hey neither have I | 21:21 |
clarkb | https://gerrit-review.googlesource.com/c/gerrit/+/289602 <- something I noticed | 21:21 |
ianw | should be easy to debug, only ~120 functions deep there :) | 21:28 |
clarkb | ianw: ya tahts why I pushed a docs bugfix :) | 21:29 |
clarkb | I don't even dare look at the indexer stuff | 21:29 |
*** weshay|interview is now known as weshay|ruck | 21:46 | |
*** slaweq_ has joined #opendev | 21:49 | |
*** slaweq has quit IRC | 21:52 | |
*** hamalq has quit IRC | 21:55 | |
*** hamalq has joined #opendev | 21:56 | |
*** openstackgerrit has joined #opendev | 21:58 | |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: Use Python 3.x with launchpadlib https://review.opendev.org/c/zuul/zuul-jobs/+/763834 | 21:58 |
openstackgerrit | Jeremy Stanley proposed zuul/zuul-jobs master: Pin keystoneauth1 when using older Python https://review.opendev.org/c/zuul/zuul-jobs/+/763866 | 21:58 |
*** hamalq has quit IRC | 21:59 | |
*** sboyron__ has quit IRC | 22:03 | |
*** iurygregory has quit IRC | 22:05 | |
*** iurygregory has joined #opendev | 22:06 | |
*** iurygregory has quit IRC | 22:20 | |
clarkb | I'm thinking I womt send out ameeting agenda for tomorrow and instead hse the time to recap the gerrit stuff? | 22:31 |
clarkb | part of that us selfish because I'm exhausted | 22:32 |
ianw | seems ok, not sure there are other pressing issues | 22:32 |
ianw | i noticed show-connections "SSHD Backend: nio2" | 22:33 |
ianw | i wonder if there's actually different sshd backends | 22:33 |
ianw | "Starting from version 0.9.0 Apache SSHD project added support for NIO2 IoSession. To use the old MINA session the backend option must be set to MINA" | 22:33 |
ianw | https://opendev.org/opendev/gerrit/commit/fc1ed9cb90e170114a47773dad0c9d8062587c6b | 22:34 |
ianw | looks like we've likely been using that on 2.13, so false alarm | 22:35 |
*** iurygregory has joined #opendev | 22:35 | |
ianw | ok so close-connection on the gerritbot sends it into it's loop. should be able to debug from here | 22:37 |
*** openstackgerrit has quit IRC | 22:37 | |
*** slaweq_ has quit IRC | 22:38 | |
*** DSpider has quit IRC | 22:40 | |
tristanC | clarkb: fungi: in https://registry.npmjs.org/@softwarefactory-project/re-gerrit/-/re-gerrit-0.1.0-rc0.tgz you can find a `dist/ZuulResultPlugin.bs.js` file that seems to work when copied to /var/gerrit/plugins/zuul-result.js of docker.io/opendevorg/gerrit:3.2 | 22:40 |
*** openstackgerrit has joined #opendev | 22:58 | |
openstackgerrit | Alex Schultz proposed ttygroup/gertty master: Add version specific changes for git-url https://review.opendev.org/c/ttygroup/gertty/+/763885 | 22:58 |
*** openstackgerrit has quit IRC | 22:59 | |
*** iurygregory_ has joined #opendev | 23:02 | |
*** iurygregory has quit IRC | 23:04 | |
*** iurygregory_ is now known as iurygregory | 23:08 | |
JayF | should gertty, and the shipped example-opendev.yaml file, work with the new gerrit? | 23:14 |
JayF | https://gist.github.com/jayofdoom/2da976e44ea298c7c50531dda250e7c2 unsure if this is user/config error, perhaps the example being for old-gerrit and needing update, or what | 23:14 |
JayF | Hmm. The response from the version endpoint appears busted -- http://review.opendev.org/config/server/version downloads a 'version.json', which has erroneous characters in it | 23:22 |
JayF | It has a )]}' prepended to it, before the version string. I put the exact result from the curl in a comment on the above gist. | 23:23 |
fungi | JayF: make sure you set basic auth now instead of digest | 23:30 |
fungi | that's the only change i needed to make in my config | 23:31 |
JayF | I can do that, but the test with the version endpoint seems unauthenticated | 23:31 |
JayF | so IDK if that's a red herring, but the return from that URL is clearly invalid | 23:31 |
JayF | Hmm. Basic auth isn't set in the example conf I was using, I'll dig in | 23:31 |
JayF | it's syncing now with that change | 23:33 |
JayF | the version endpoint must be a red herring; but it's still super strange and you should probably check to make sure it's what you expect as well | 23:33 |
JayF | I'll push up a PR to the gertty example to add auth-type:basic | 23:33 |
fungi | yeah, we included the necessary setting in the announcement | 23:33 |
fungi | but that was likely easy to skim past | 23:34 |
fungi | and the json csrf buster at the start of gerrit rest api responses is normal, the old version we ran did the same. gertty knows to strip it prior to parsing | 23:35 |
JayF | ack; works for me. | 23:35 |
JayF | and I'm not an "upgrade" of gertty, I' | 23:35 |
JayF | *I'm setting it up for the first time now | 23:36 |
JayF | so I wouldn't have been looking for that config | 23:36 |
JayF | https://review.opendev.org/c/ttygroup/gertty/+/763890 updates the config example upstream | 23:36 |
fungi | oh, cool | 23:40 |
JayF | There are other breakages too, but I see fixes to gertty for them Including https://review.opendev.org/c/ttygroup/gertty/+/763885 | 23:43 |
fungi | if you can confirm they fix bugs for you, that's useful feedback to leave in review comments | 23:44 |
JayF | I was just thinking I should probably install that from source instead of from pypi :| | 23:44 |
JayF | down the rabbithole I go | 23:44 |
fungi | i do run master branch tip and/or additional cherry-picked commits from review to test stuff | 23:45 |
*** hashar has quit IRC | 23:45 | |
JayF | yeah, it makes sense for me too, just was trying this as an alternative to using the new web ui, and as I said, down the rabbithole I go | 23:47 |
JayF | and that does fix my issue, commenting | 23:48 |
fungi | i like gertty in particular because i can run it under tmux on a vm in the cloud and attach to it over mosh from multiple client terminals, like with my other communications tools (mail client, irc client, calendaring, todo list, et cetera) so that allows me to float from machine to machine without losing context | 23:49 |
JayF | honestly I tend to glom on to what I know; the old gerrit web ui was what I knew so it was good enough | 23:50 |
*** openstackgerrit has joined #opendev | 23:50 | |
openstackgerrit | Tristan Cacqueray proposed opendev/system-config master: WIP: gerrit: install zuul-result plugin to recover build table display https://review.opendev.org/c/opendev/system-config/+/763891 | 23:50 |
JayF | if I have to learn a new UI, might as well be an attempted use of gertty | 23:50 |
*** openstackgerrit has quit IRC | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!