openstackgerrit | Merged opendev/system-config master: etherpad: fix robots.txt https://review.opendev.org/c/opendev/system-config/+/763502 | 00:02 |
---|---|---|
clarkb | infra-prod-base failed, a number of instances get an rc -13 during the instal exim step | 00:04 |
clarkb | zm04.opendev.org was included in this list, the other 6 new zuul mergers seem fine though | 00:04 |
clarkb | I guess I'll proceed with the other 6 for now then swing back around and check that our daily infra-prod-base gets zm04 this evening? | 00:05 |
clarkb | the ansible doesn't auto start zuul-merger so we can turn them on as we are happy with them | 00:05 |
clarkb | oh actually does launch node run base? | 00:06 |
clarkb | it does, in that case we can proceed with 04 as well since it should be caught up | 00:06 |
prometheanfire | was there a reason to avoid virtualenv-20.4.2? https://review.opendev.org/777171 | 00:07 |
clarkb | prometheanfire: virtualenv made a number of iffy updates recently after the big 20 release | 00:08 |
clarkb | I don't recall all the details but suspect that any pin would be related to that | 00:08 |
clarkb | they ended up fixing a lot of the issues and introducing new regressions. Its possible that back and forth has settled down now | 00:08 |
*** DSpider has quit IRC | 00:09 | |
ianw | ok, i've removed all bup cron jobs | 00:10 |
clarkb | infra-root I cleaned up the non qcow2 imgaes in /opt on bridge. Then realized that /opt is another device so that doesn't help significantly with disk space pressure | 00:10 |
ianw | the old backup server has /opt/backups-201903 and /opt/backups-202007 | 00:10 |
*** tosky has quit IRC | 00:10 | |
ianw | i will remove and free up 201903 backups, and i think i can shrink 20207 to a single tb volume; so that's 5tb of volumes we can free up | 00:11 |
ianw | i'll then attach 202007 to the new server and make a note to delete in say, 6 months | 00:12 |
clarkb | ++ | 00:13 |
clarkb | I'm going to start cleaning up old ansible logs from the first half of last year on bridge | 00:13 |
fungi | sounds great, thanks! | 00:14 |
clarkb | since the logs were the other place I identified much disk use | 00:14 |
clarkb | I think the reason those leak is when the job stimeout we don't rotate them properly | 00:14 |
clarkb | maybe | 00:14 |
prometheanfire | clarkb: ok, well, I'm gonna approve it, if stuff breaks, feel free to force a revert | 00:15 |
clarkb | ok I think I'll stop perusing and cleaning up old ansible logs there. I haven't completely cleaned things up and that didn't make a huge dent, but its something | 00:31 |
ianw | ok, 202007 volume shrunk to be just on /dev/xvdb on the old server. shutting down now | 00:32 |
fungi | ianw: interesting note, passing --install-plugin download-commands gets HttpGitReviewTestCase.test_git_review_d passing | 00:35 |
fungi | so i apparently can't drop that | 00:35 |
clarkb | fungi: does git review -d inspect the change to get the download url? | 00:36 |
clarkb | if so that could explain it | 00:36 |
fungi | the tests want to use the change-id rather than the index integer | 00:36 |
fungi | presumably because that's in the commit message | 00:37 |
clarkb | ah | 00:38 |
ianw | i would have thought that was a core plugin, but ... there you go | 00:39 |
*** brinzhang has joined #opendev | 00:40 | |
fungi | rerunning locally to see if that also fixes test_uploads_with_nondefault_rebase | 00:43 |
fungi | also just got hit by another round of privmsg spam | 00:45 |
ianw | yeah, i got some too | 00:45 |
ianw | i really can't imagine your ROI in IRC spam is positive | 00:45 |
ianw | ok, old bup backups are mounted on the new rax.ord server @ /opt/bup-202007 | 00:47 |
ianw | retrieving anything is left as an exercise ... will need some UID munging, python2 environments for bup, etc. | 00:48 |
ianw | #status log Old rax.ord bup backups mounted RO on the new rax.ord borg backup server @ /opt/bup-202007 | 00:48 |
openstackstatus | ianw: finished logging | 00:48 |
fungi | ianw: pretty sure the only goal for any of the irc spammers in the past few years is to discredit and/or drive users off freenode, and to exact revenge for staff actions | 00:48 |
clarkb | ianw: I expect you can run a python2 container that talks to localhost to try and sort that out | 00:49 |
*** dmsimard0 has joined #opendev | 00:50 | |
*** dmsimard has quit IRC | 00:51 | |
*** dmsimard0 is now known as dmsimard | 00:51 | |
clarkb | ianw: its looking like I may need to call ita day before ansible updates the new merger nodes. I don't think that is a big deal since zm01 showed it should go smoothly and it won't auto start the mergers. I can just check them in the morning, start them up then if they look good after a bit turn off the old ones | 00:51 |
clarkb | tl;dr I don't expect you need to do anything wit hthem | 00:51 |
ianw | ok, i can keep an eye and swap them in too if you like | 00:52 |
clarkb | I wouldn't worry about it, I'm ure you've got othre things to look at | 00:52 |
clarkb | I just wanted to let you know that it is still pending but I think you can safely ignore it :) | 00:52 |
fungi | py37: commands succeeded | 00:57 |
fungi | woohoo! | 00:57 |
*** mlavalle has quit IRC | 00:57 | |
clarkb | freenode admins making notices now | 00:58 |
clarkb | fungi: push it :) /me hangs around to review it | 00:59 |
openstackgerrit | Jeremy Stanley proposed opendev/git-review master: Test with Gerrit 2.13 https://review.opendev.org/c/opendev/git-review/+/777268 | 00:59 |
ianw | hrm, the old volumes have gone into "error deleting volume" state | 01:00 |
spotz | Can we turn off PMs?:) | 01:00 |
clarkb | spotz: freenode suggests you set yourself +R to prevent PMs from unidentified users | 01:01 |
ianw | can't even give our TBs away :) | 01:01 |
openstackgerrit | Jeremy Stanley proposed opendev/git-review master: Test/assert Python 3.9 support https://review.opendev.org/c/opendev/git-review/+/772589 | 01:01 |
spotz | Ok let me try that/ Thanks clarkb | 01:02 |
fungi | clarkb: ianw: topic:gitreview-2 has the currently outstanding changes slated for a 2.0.0 release | 01:02 |
fungi | tomorrow i'll work on a python package refresh for it like i did with bindep in 774106 | 01:04 |
fungi | and get that included too | 01:04 |
fungi | otherwise i think it's basically ready (assuming zuul and reviewers concur) | 01:04 |
clarkb | fungi: those changes all lgtm and should have my +2 now. Left a thought on the one that fixes the gerrit bootstrapping though that can be addressed in a followup | 01:07 |
clarkb | and now I need to find dinner | 01:07 |
fungi | thanks! | 01:10 |
fungi | ultimately all that was just in service of being able to test with python 3.9 | 01:11 |
fungi | the whole test framework could stand to be redesigned from the ground up | 01:11 |
ianw | i think it was definitely worth sorting out. tbh i don't think it needs that much re-design | 01:12 |
ianw | but we could include git-review as part of the system-config job too, instead of pushing directly | 01:12 |
ianw | just as another angle | 01:12 |
*** brinzhang has quit IRC | 01:13 | |
ianw | i filed a ticket with rax on our zombie volumes | 01:13 |
*** brinzhang has joined #opendev | 01:15 | |
*** brinzhang has quit IRC | 01:16 | |
*** LowKey has quit IRC | 01:44 | |
*** hamalq has quit IRC | 01:53 | |
*** hamalq has joined #opendev | 01:54 | |
*** zimmerry has quit IRC | 02:02 | |
*** hamalq has quit IRC | 02:08 | |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Revert "Revert "Update upload-logs roles to support endpoint override"" https://review.opendev.org/c/zuul/zuul-jobs/+/776677 | 02:25 |
prometheanfire | well, let's see if the virtualenv update breaks things | 02:30 |
openstackgerrit | Merged opendev/git-review master: Add missing -p/-P/-w/-W/--license to manpage https://review.opendev.org/c/opendev/git-review/+/774567 | 02:37 |
openstackgerrit | Merged opendev/git-review master: Create test projects with positional argument https://review.opendev.org/c/opendev/git-review/+/777260 | 03:14 |
openstackgerrit | Merged opendev/git-review master: Test with Gerrit 2.13 https://review.opendev.org/c/opendev/git-review/+/777268 | 03:15 |
*** brinzhang has joined #opendev | 03:17 | |
*** ysandeep|away is now known as ysandeep|ruck | 03:41 | |
*** ykarel has joined #opendev | 04:11 | |
*** zimmerry has joined #opendev | 04:37 | |
*** jmorgan has quit IRC | 05:18 | |
*** jmorgan has joined #opendev | 05:18 | |
*** dviroel has quit IRC | 05:39 | |
*** whoami-rajat has joined #opendev | 05:41 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: zuul-summary-status : handle SKIPPED jobs https://review.opendev.org/c/opendev/system-config/+/777298 | 05:52 |
*** marios has joined #opendev | 05:57 | |
ykarel | I see mirrors are outdated, can someone check the reason | 05:59 |
ykarel | http://mirror.ord.rax.opendev.org/centos/8-stream/AppStream/x86_64/os/repodata/ vs http://mirror.dal10.us.leaseweb.net/centos/8-stream/AppStream/x86_64/os/repodata/ | 05:59 |
ianw | ykarel: all mirroring logs are @ https://static.opendev.org/mirror/logs/rsync-mirrors/ | 06:00 |
ykarel | ianw, Thanks, checking logs | 06:02 |
ianw | our last run was @ 2021-02-24T04:44:09,716424675+00:00 | 06:04 |
ykarel | ianw, yes, so may be the source mirror was not consistent at that time? | 06:08 |
ykarel | from https://mirror-status.centos.org/ i see mirror.dal10.us.leaseweb.net @ http 1 hour | 06:08 |
ykarel | renewed | 06:09 |
ykarel | ok | 06:09 |
ianw | yeah, i'm not seeing any rsync errors or errors with us releasing, so i'd say we're getting what upstream was serving | 06:09 |
ykarel | when will be next run? | 06:09 |
ykarel | is it possible to retrigger the run now? | 06:09 |
*** zoharm has joined #opendev | 06:12 | |
ianw | it runs every two hours (https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/tasks/rsync.yaml#L42) | 06:13 |
ianw | ykarel: i'm running a manual run now | 06:14 |
ykarel | ianw, Thanks | 06:15 |
ykarel | i see the contents are updated now, Thanks again ianw | 06:37 |
ianw | ykarel: np | 07:24 |
*** slaweq has joined #opendev | 07:27 | |
*** eolivare has joined #opendev | 07:32 | |
*** sshnaidm|afk is now known as sshnaidm | 07:38 | |
*** ralonsoh has joined #opendev | 07:48 | |
*** ysandeep|ruck is now known as ysandeep|lunch | 08:00 | |
priteau | Good morning. We had a strange error in gate jobs overnight: "ERROR Failed to update project None in 1s" | 08:02 |
priteau | https://zuul.opendev.org/t/openstack/build/999f096bccbe43c286b7c6ed3c5deeb5 | 08:02 |
*** rpittau|afk is now known as rpittau | 08:11 | |
*** fdegir has joined #opendev | 08:18 | |
*** andrewbonney has joined #opendev | 08:27 | |
*** ykarel_ has joined #opendev | 08:31 | |
*** ykarel has quit IRC | 08:34 | |
*** ykarel_ is now known as ykarel | 08:38 | |
*** jpena|off is now known as jpena | 08:57 | |
*** ysandeep|lunch is now known as ysandeep|ruck | 09:10 | |
*** tosky has joined #opendev | 09:18 | |
*** brinzhang has quit IRC | 10:04 | |
*** brinzhang has joined #opendev | 10:05 | |
*** dtantsur|afk is now known as dtantsur | 10:07 | |
*** ysandeep|ruck is now known as ysandeep|brb | 10:10 | |
*** ysandeep|brb is now known as ysandeep|ruck | 10:24 | |
*** dviroel has joined #opendev | 10:55 | |
*** klonn has joined #opendev | 11:55 | |
*** klonn has quit IRC | 11:57 | |
*** akahat has quit IRC | 12:05 | |
*** akahat has joined #opendev | 12:05 | |
*** eolivare has quit IRC | 12:08 | |
*** jpena is now known as jpena|lunch | 12:38 | |
*** ysandeep|ruck is now known as ysandeep|prgm_ca | 12:59 | |
*** eolivare has joined #opendev | 13:16 | |
fungi | looks like that build started 2021-02-23 23:29:24 utc from ze11 | 13:21 |
fungi | it failed to update a bunch of repositories, logging errors like | 13:23 |
fungi | 2021-02-23 23:29:26,134 ERROR zuul.ExecutorServer: [e: 82190bddba5d471e915725e15ea04a87] [build: 999f096bccbe43c286b7c6ed3c5deeb5] Process pool got broken | 13:24 |
fungi | seems like it was just that build, i don't see "Process pool got broken" logged for any other builds, at least on that executor | 13:25 |
fungi | i'm not finding that on any other executors either | 13:33 |
fungi | so seems to have been a contained incident | 13:33 |
fungi | whatever caused it | 13:33 |
fungi | i'll do some more digging to figure out what codepaths could log that error | 13:33 |
*** dtantsur is now known as dtantsur|brb | 13:35 | |
openstackgerrit | Merged opendev/git-review master: Test/assert Python 3.9 support https://review.opendev.org/c/opendev/git-review/+/772589 | 13:35 |
*** jpena|lunch is now known as jpena | 13:39 | |
*** ysandeep|prgm_ca is now known as ysandeep | 13:51 | |
*** ysandeep is now known as ysandeep|mtg | 13:55 | |
*** rpittau is now known as rpittau|afk | 14:04 | |
*** LowKey has joined #opendev | 14:17 | |
*** dhellmann has quit IRC | 14:30 | |
*** dhellmann has joined #opendev | 14:31 | |
*** LowKey has quit IRC | 14:41 | |
*** LowKey has joined #opendev | 14:41 | |
*** chandankumar has quit IRC | 14:58 | |
*** zoharm has quit IRC | 15:08 | |
*** dhellmann has quit IRC | 15:12 | |
*** lpetrut has joined #opendev | 15:13 | |
*** dtantsur|brb is now known as dtantsur | 15:18 | |
*** ysandeep|mtg is now known as ysandeep | 15:20 | |
*** bhagyashris is now known as bhagyashri|ruck | 15:20 | |
*** dhellmann has joined #opendev | 15:22 | |
*** lpetrut has quit IRC | 15:24 | |
*** chandan_kumar has joined #opendev | 15:29 | |
*** chandan_kumar is now known as chandankumar | 15:32 | |
*** slittle1 is now known as Guest2721 | 15:52 | |
*** elod has quit IRC | 15:58 | |
*** elod has joined #opendev | 16:00 | |
clarkb | I have started zuul-merger on the new merges and am proceeding with stopping zuul-merger on the old mergers | 16:07 |
clarkb | if things look good in a couple of hours I'll delete the old mergers entirely | 16:07 |
*** mgoddard has joined #opendev | 16:11 | |
*** ysandeep is now known as ysandeep|dinner | 16:12 | |
openstackgerrit | Emma Foley proposed openstack/project-config master: Add infrawatch/functional-tests to available repos https://review.opendev.org/c/openstack/project-config/+/777428 | 16:28 |
*** ysandeep|dinner is now known as ysandeep | 16:33 | |
*** dtantsur is now known as dtantsur|brb | 16:50 | |
fungi | okay, i'm a little confused, the only places i find the string 'Process pool got broken' are in zuul/executor/server.py and there are two AnsibleJob.execute() and ExecutorServer._innerUpdateLoop() | 16:50 |
fungi | if BrokenProcessPool is raised in either of those places that message is logged and resetProcessPool() is called. one of them also calls _send_aborted() but the other does not | 16:51 |
fungi | and the latter looks like what we ran into, based on context in the log | 16:52 |
*** toomer has joined #opendev | 16:53 | |
fungi | aha, yeah i wasn't looking at the tracebacks before | 16:54 |
*** marios has quit IRC | 16:54 | |
fungi | so it was raised in _innerUpdateLoop() and the result was a build error, which i suppose is why the build was not automatically retried | 16:54 |
fungi | i'll take this to #zuul | 16:54 |
*** ykarel has quit IRC | 17:05 | |
toomer | I hope somebody from the Infra team can help me with issue on OpenDev Gerrit (review.opendev.org) | 17:08 |
toomer | When we try to push the new patch to gerrit server we are getting the below error: | 17:08 |
*** gibi has joined #opendev | 17:09 | |
toomer | git push ssh://**********@review.opendev.org:29418/openstack/nova c8478e40bdb996d5e0a1f01ae0ae55e6926f318d:refs/for/master%topic=bug/1909120 | 17:10 |
fungi | toomer: if you're having trouble pasting the error into irc (e.g. it's more than a line or two) then use http://paste.openstack.org/ and stick the url in here | 17:10 |
toomer | error: remote unpack failed: error Missing tree 440f101f5474ed2009b4ced41a31c6673a8a1c80 | 17:10 |
toomer | fatal: Unpack error, check server log | 17:11 |
*** ysandeep is now known as ysandeep|away | 17:11 | |
fungi | i'll check the server's log | 17:11 |
toomer | I'm wondering if there is anything on the server logs (review.opendev.org) corresponding to that push | 17:12 |
toomer | Great, thanks | 17:12 |
fungi | yeah, it definitely logged that. i'll see if that tree is actually missing | 17:12 |
toomer | I can try to push again in order to reproduce, if needed. | 17:13 |
fungi | i can `git show 440f101f5474ed2009b4ced41a31c6673a8a1c80` in the bare repo for openstack/nova on the server's filesystem, so that should be the same one gerrit is referencing | 17:13 |
fungi | i'll see if there are more tea leaves i haven't successfully read in the backtrace it logged | 17:14 |
fungi | toomer: i think we've seen this before when someone pushed a large change series. could that be the case this time too? | 17:18 |
clarkb | fungi: toomer yoctozepto had a similar issue with a different repo previously and the issue was client side aiui | 17:18 |
clarkb | the pack file the client was sending to the remote was not valid essentially | 17:18 |
clarkb | I want to say yoctozepto rebased which cased the local tree to be pushable | 17:18 |
fungi | ahh, the error logged by gerrit makes it sound entirely server-side, but i could see that still being the case | 17:19 |
clarkb | fungi: the hints are in the traceback, the error is happening server side but during recieving and verification of the pack from the remote side iirc | 17:19 |
fungi | it's coming from somewhere in FullConnectivityChecker, so yeah that makes some sense | 17:20 |
fungi | some sort of object disconnect in the packfile being pushed, not in the actual repo gerrit has | 17:21 |
clarkb | yup exactly | 17:21 |
toomer | I can try to rebase this change, but we had that problem before for openstack/tempest and after a while we just try again and that went trough without errors | 17:22 |
fungi | also if you're pushing multiple commits at once, i guess that chance increases. hard to have a disconnect when pushing a single change | 17:22 |
fungi | is this one change, or a series of changes? | 17:22 |
clarkb | I suspect that it is related to git protocl v2 | 17:22 |
clarkb | and there is either a bug in jgit or in cgit | 17:22 |
fungi | oh, yeah maybe | 17:23 |
clarkb | where the two sides don't quite negotate the exact set of content that needs pushing correctly | 17:23 |
fungi | and rebasing just papers over the problem by rewriting the commits and you get lucky and don't have the same problem | 17:23 |
clarkb | ya | 17:24 |
clarkb | another option may be to force v1 | 17:24 |
clarkb | and see if it goes away since the negotiation is different aiui | 17:25 |
fungi | yep, worth a try | 17:26 |
toomer | This is the single change and is already base on the latest commit from what I see | 17:28 |
toomer | How I can force git protocol v1 ? | 17:29 |
clarkb | `git -c protocol.version=1 push gerrit HEAD:refs/for/master` ? I think I have not tested that | 17:30 |
clarkb | fungi: ^ does that look right for psuhing directly without git review? | 17:30 |
toomer | ! [remote rejected] c8478e40bdb996d5e0a1f01ae0ae55e6926f318d -> refs/for/master%topic=bug/1909120 (n/a (unpacker error)) | 17:31 |
clarkb | that is with the command I pasted? | 17:31 |
toomer | Ok, I will try the git protocol version 1 and let you know | 17:32 |
toomer | No | 17:32 |
toomer | Same error | 17:37 |
toomer | http://paste.openstack.org/show/802970/ | 17:37 |
fungi | toomer: is this the only change you're aware of returning this error, or is it generally a problem for any change someone tries to push in nova? | 17:37 |
toomer | We had similar problem with change for openstack/tempest repository | 17:38 |
fungi | when was that? | 17:38 |
clarkb | last time this occurred I also suggested to yoctozepto that a git fsck might be helpful | 17:38 |
clarkb | but I don't think the fsck was done | 17:38 |
toomer | But after some time, we tried it again and it went trough | 17:38 |
fungi | toomer: so it's been a while, sounds like. do you know whether it was the exact same error message? | 17:39 |
toomer | Let me check.... | 17:39 |
toomer | 24-Feb-2021 12:19 GMT | 17:40 |
clarkb | it might be worth trying the rebase. Or pushing another change entirely | 17:43 |
clarkb | just to try and narrow this down (rebase was what helped last time) | 17:43 |
clarkb | my v2 protocol suspicion at least seems to be ruled out for now | 17:45 |
fungi | toomer: can you run `git fsck` in your nova tree (it will probably take a few minutes) and see if it reports any errors? | 17:46 |
toomer | Here is a client logs for the tempest change. On the first attempt it failed but after couple minutes the same patchset went trough | 17:47 |
toomer | http://paste.openstack.org/show/802971/ | 17:47 |
fungi | particularly interested in the "checking connectivity" phase of the fsck | 17:47 |
toomer | sure, will do that, just a sec | 17:48 |
clarkb | I wonder too if the remote pack is relying on the local (server) packs to verify connectivity | 17:48 |
clarkb | and maybe gerrit jgit has paged out/uncached/etc the relevant bits for that particular sha | 17:48 |
clarkb | beacuse as you point out it is there in the actual repo | 17:48 |
clarkb | and then maybe it reloads that later and is happy again | 17:48 |
* clarkb just thinking out loud | 17:49 | |
fungi | i do also see the error for the tempest push logged at the following times today: 10:17:55, 10:31:55, 10:34:05, 12:19:41 | 17:50 |
toomer | Is there anything logged for tempest at 24-Feb-2021 12:23 ? | 17:51 |
clarkb | I was able to push remote: https://review.opendev.org/c/openstack/nova/+/777444 DO NOT MERGE testing a thing | 17:52 |
clarkb | (as a sanity check) | 17:53 |
clarkb | that has me thinking to client side state again though | 17:53 |
fungi | toomer: no missing tree errors for tempest changes after 12:19:41 utc | 17:54 |
yoctozepto | toomer, clarkb, fungi: so the issue back then was that it was a stack of the changes; I wanted to modify just the tip and push it | 17:55 |
yoctozepto | but gerrit did not like it | 17:56 |
yoctozepto | so I had to rebase off origin/master | 17:56 |
yoctozepto | (the origin that I just fetched) | 17:56 |
yoctozepto | and then it worked | 17:56 |
clarkb | well I think the problem is more subtle than that. My understanding of the issue is your client sends a pack file with the stuff in it for the push and gerrit verifies its completeness, but if it isn't complete you get this error | 17:57 |
clarkb | it is possible that the reason gerrit sees this as incomplete is due to a gerrit or jgit bug though | 17:57 |
clarkb | not necessarily the clients fault. But I don't think it is specific to trying to modify a stack of changes, more the state of the pack sent by the client to gerrit and if gerrit can verify connectivity | 17:57 |
fungi | looks like only two users have encountered missing tree errors today according to the gerrit log. one for tempest and another for nova, so the problem is not widespread at least | 17:58 |
clarkb | I had previously suspected that maybe this is a git protocol v2 growing pain but the above test should have used v1 and it failed too | 17:58 |
clarkb | in this case a fsck as well as a rebase and try again arep robably the next things to try in order to gather more data | 17:58 |
fungi | though the rebase would have to be forced, sounds like, unless another change merges in the meantime | 17:59 |
*** jpena is now known as jpena|off | 17:59 | |
clarkb | I think git rebase -i pick foo will do it | 17:59 |
clarkb | but maybe not, might need to amend it instead | 18:00 |
*** ralonsoh has quit IRC | 18:00 | |
fungi | yeah, commit --amend and then writing without editing the commit message would update the timestamp and change the commit id at least, so the pack will be different after that | 18:02 |
clarkb | I pushed a followup change to https://review.opendev.org/c/openstack/nova/+/777444 which modifies one of the files listed in the 440f101f5474ed2009b4ced41a31c6673a8a1c80 object to check if maybe you need to do somethign that should require connectivity to that in history | 18:02 |
clarkb | it seems to have worked fine | 18:02 |
clarkb | s/followup change/followup patchset/ | 18:02 |
clarkb | I have git 2.30.1 | 18:02 |
clarkb | toomer: might be helpful to record the git client version while we're at it | 18:03 |
fungi | priteau: just to follow up on your error, it looks like the executor to which that build was assigned hit an oom condition around the time it was starting the build and that process got sacrificed. normally zuul should have automatically retried the build but seems there was a corner case in the exception handling in that routine which caused us to report the error instead. i've proposed | 18:04 |
fungi | https://review.opendev.org/777441 to hopefully avoid that in the future | 18:04 |
*** iurygregory_ has joined #opendev | 18:04 | |
*** iurygregory has quit IRC | 18:05 | |
*** iurygregory_ is now known as iurygregory | 18:05 | |
clarkb | fungi: https://review.opendev.org/c/opendev/system-config/+/777206/ I've marked that one active. the mergers seem fine. If we land 777206 the next step will be to remove the old servers | 18:05 |
toomer | The change for the nova just went through, without any rebase or updates | 18:25 |
clarkb | interesting | 18:26 |
toomer | http://paste.openstack.org/show/802977/ | 18:27 |
toomer | :-( | 18:27 |
toomer | So we won't get to the bottom of this today | 18:27 |
toomer | I will ask my team to report any issues like that in the future so hopefully we will get more luck next time | 18:28 |
toomer | Thanks for all the support today ! | 18:28 |
clarkb | toomer: what version of git are you using? | 18:29 |
clarkb | just in case that is useful (would be if it is a protocol problem) | 18:29 |
toomer | sec | 18:29 |
clarkb | but that could possibly explain why this isn't widespread if there is a specific version of git that doesn't play nice with gerrit (I don't have evidence of this yet, just trying to collect as much info as possible) | 18:29 |
toomer | git --version # 'git version 2.17.1' | 18:29 |
clarkb | thanks | 18:30 |
clarkb | fungi: fyi https://storyboard.openstack.org/#!/story/2007365 | 18:30 |
toomer | Did that specific gerrit fault happend beside today ? | 18:30 |
*** dtantsur|brb is now known as dtantsur | 18:31 | |
clarkb | toomer: yes yoctozepto observed it on an osa repo a few weeks ago. Thoese are the only incidences I'm aware of | 18:31 |
toomer | It's good it's not spread. I will need to spend more time investigating it next time. | 18:32 |
fungi | i'll dig in gerrit error logs over the past month and see if i can find more | 18:32 |
fungi | looked like it was gibi hitting the error for the nova change, according to the logs | 18:32 |
toomer | Yes, I'm working with him | 18:33 |
toomer | We are part of the Nordix Fundation | 18:33 |
clarkb | the next time this happens it might be helpful to try pushing with a newer git (to see if that helps) | 18:34 |
clarkb | containers may make that easy ? I don't know | 18:34 |
clarkb | and just continue to try and isolate where the problem is originating | 18:34 |
fungi | for the missing tree errors today, 750b013 in openstack/tempest and 440f101 in openstack/nova | 18:35 |
toomer | I will talk with gibi tomorrow and see how frequently he is hitting this problem | 18:35 |
fungi | yesterday there was a user getting a missing tree 3e44c04 in openstack/cinder at 02:19:50 and 02:20:41 | 18:36 |
clarkb | 2.17.1 is bionic's git version fwiw | 18:39 |
fungi | thursday of last week (2021-02-18) at 08:49:33 a user got a missing tree 817e601 on openstack/neutron and another user got a missing tree 160fe0c on openstack/nova 11:31:56 that day | 18:41 |
fungi | and missing tree 466cd02 in openstack/nova for a different user at 13:08:31 that day | 18:42 |
toomer | Here is the change that went trough in case you are wondering how it looks like | 18:42 |
toomer | https://review.opendev.org/c/openstack/nova/+/777447 | 18:42 |
fungi | weird it went from 13:08:31 friday to 02:19:50 yesterday without happening | 18:42 |
*** andrewbonney has quit IRC | 18:42 | |
fungi | and only seems to be impacting fairly large repositories (tempest, nova, cinder, neutron) | 18:42 |
clarkb | toomer: yes that may be useful too, thanks | 18:42 |
fungi | er, went from 13:08:31 thursday to 02:19:50 yesterday | 18:43 |
clarkb | fungi: could be the jgit on the server side losing track of things due to GC'ing I guess | 18:43 |
fungi | without happening | 18:43 |
fungi | yeah, that's a good theory | 18:43 |
clarkb | though you'd expect it to go look on disk if it doesn't have it in memory | 18:43 |
fungi | maybe the cache winds up out of sync temporarily? | 18:43 |
toomer | How frequently you are GC on gerrit repos ? | 18:44 |
clarkb | ya thats possible. I know one of the tunables is to make these refs hard not soft refs so they don't get GC'd | 18:44 |
clarkb | toomer: java gc not git gc | 18:44 |
clarkb | I don't suspect git gc here because the objects are in the repo | 18:44 |
clarkb | (you can verify via clone and git show) | 18:44 |
clarkb | but setting those to hard refs implies you need all the memories becuse you'll never free that memory again | 18:45 |
clarkb | (which is why we haven't done that) | 18:45 |
yoctozepto | clarkb: it was kolla-ansible actually :-) | 18:49 |
yoctozepto | let me see my git version as well | 18:50 |
yoctozepto | git version 2.18.4 | 18:50 |
yoctozepto | maybe 2.17-2.18 at least are affected | 18:51 |
yoctozepto | if I hit it ever again, I will try a git upgrade... if I don't forget that is! | 18:51 |
toomer | tbh, I don't think it's a git version. The bionic git version is common and it will affect lots of users | 18:52 |
toomer | if this is the actual issue | 18:52 |
*** eolivare has quit IRC | 18:58 | |
fungi | yeah, i'm seeing enough variety in usernames that i have doubts it's related to the git client version | 19:20 |
*** toomer has quit IRC | 19:28 | |
clarkb | fungi: have time for https://review.opendev.org/c/opendev/system-config/+/777206/ ? (sorry I just want to remove those old servers as that seems prudent before spinning up even more new servers) | 19:44 |
fungi | yeah, sorry, got sidetracked writing a fix for that git review story you linked | 19:44 |
openstackgerrit | Jeremy Stanley proposed opendev/git-review master: Don't test rebasing with unstaged changes https://review.opendev.org/c/opendev/git-review/+/777456 | 19:50 |
fungi | clarkb: ^ i should probably also add a test for that | 19:50 |
clarkb | testing ++ espeically when working with the user's working dir | 19:50 |
clarkb | toomer isn't here anymore but I had another thought during my walk earlier. 2.17 and 2.18 are both pre git protocol v2 by default | 19:51 |
clarkb | however, my git client is post git protocol v2 by default | 19:51 |
clarkb | I wonder if this is a v1 issue and using v2 addresses it | 19:51 |
clarkb | its possible that we don't see it happen frequently ebeacuse its a corner case or requires a specific situation. In any case I think testing with git >= 2.26 (when git protocol v2 became the default) would still be helpful if hitting this again | 19:52 |
*** dtantsur is now known as dtantsur|afk | 19:56 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: DNM: testing parallel bindep on multiple hosts https://review.opendev.org/c/zuul/zuul-jobs/+/777462 | 20:14 |
openstackgerrit | Merged opendev/system-config master: Cleanup zm02-08.openstack.org https://review.opendev.org/c/opendev/system-config/+/777206 | 20:21 |
clarkb | fungi: your proposed fix for the git-review thing lgtm, though I think having a test would be a good idea | 20:25 |
fungi | yeah, it's in the works | 20:25 |
fungi | almost there | 20:26 |
*** sshnaidm is now known as sshnaidm|afk | 20:28 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: DNM: testing parallel bindep on multiple hosts https://review.opendev.org/c/zuul/zuul-jobs/+/777462 | 20:29 |
priteau | Thank you fungi for following up on the error | 20:35 |
fungi | priteau: yeah, i'm still not certain how to get the AnsibleJob class context from there to properly abort it, still working on that, but the upshot is it's a very rare condition related to an out of memory event on an executor | 20:37 |
fungi | ideally zuul would have just silently rerun that build rather than reporting that error | 20:38 |
fungi | clarkb: how do we go about increasing test timeout? i keep getting random test failures for git-review because some tests take too long (particularly on my increasingly pathetic and overloaded workstation, but also on some poor-performing job nodes too) | 20:41 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: DNM: testing parallel bindep on multiple hosts https://review.opendev.org/c/zuul/zuul-jobs/+/777462 | 20:41 |
*** LowKey has quit IRC | 20:41 | |
*** LowKey has joined #opendev | 20:41 | |
fungi | clarkb: cancel that. i see "self.useFixture(fixtures.Timeout(2 * 60, True))" already in here | 20:41 |
clarkb | fungi: there should also be an env var that it reads allowing you to bump it up locally | 20:42 |
fungi | well, in this case we need to bump it up in ci as well | 20:42 |
clarkb | fungi: it appeas that test suite doesn't do the env var check | 20:44 |
clarkb | but that is a common patternelsewhere | 20:44 |
clarkb | infra-root any objections to me starting to delete zm02.openstack.org - zm08.openstack.org now? | 20:44 |
fungi | clarkb: go for it, i say | 20:44 |
clarkb | 777206 landed and should be the last thing we needed before cleaning them up. The new servers have been in place for about 4 hours now | 20:45 |
clarkb | ok I'll start cleaning those up now. Then tomorrow I guess its ze01.opendev.org time | 20:46 |
fungi | clarkb: minor hole in my logic on the git-review fix, which fiddling with tests has highlighted for me... if you create a new file and then git add it, the diff will be empty | 20:46 |
fungi | so we need to catch unstaged *and also* staged but uncommitted edits in the worktree | 20:47 |
clarkb | fungi: git diff --cached will show the staged side | 20:48 |
clarkb | not sure if you can do both in one command though | 20:48 |
clarkb | 02-04 are done now. My computer just started installing a billion packge updates so I'll apuse and wait for that to complete (big updates like this tend to flap networking and trying to do scary things like deletes when that happens is only going to cause me confusion) | 20:50 |
fungi | clarkb: oh, yeah thanks. i can always run both at least | 20:53 |
fungi | clarkb: an alternative is to do git stat --porcelain and parse the output | 20:54 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: DNM: testing parallel bindep on multiple hosts https://review.opendev.org/c/zuul/zuul-jobs/+/777462 | 20:58 |
clarkb | fungi: ianw: looks like the gitea backups may still be complaining. Have we looked at that yet? | 21:07 |
fungi | i thought the fixes for gitea01 and translate merged | 21:07 |
clarkb | ah maybe they did, I just noticed the email from last night | 21:07 |
fungi | gitea01 complained again at 05:56:03 today | 21:08 |
fungi | no error from translate this time | 21:08 |
clarkb | also these updates totally did the name lookups stop working thing like they did on the other host. I'm going to reboot in a bit | 21:08 |
clarkb | that is a fun way to force people to reboot after updating, break dns | 21:08 |
fungi | 777037 was supposed to fix that, and merged 07:23 yesterday | 21:10 |
*** whoami-rajat has quit IRC | 21:12 | |
clarkb | fungi: so we wait and see ifit happens again? | 21:16 |
fungi | clarkb: well, i'm sort of wondering if it didn't get applied (fix merged ~22.5 hours before the error was sent out), or if it was masking some other problem | 21:22 |
* clarkb is having a hard time with timestamps today. | 21:23 | |
clarkb | I now see you said error was today fix was yesterday | 21:23 |
clarkb | #status log Replaced zm01-08.openstack.org with new zm01-08.opendev.org servers running on focal | 21:24 |
openstackstatus | clarkb: finished logging | 21:24 |
clarkb | Looking at the executors: I notice we need to launch specifying launch node use the ephemeral drive for /var/lib/zuul. Is there anything else special to consider about those? afs maybe? | 21:27 |
clarkb | we have focal packages for openafs in our ppa | 21:28 |
clarkb | Maybe we spin up a ze01.opendev.org then double check it happily does afs things before rolling out more focal executors | 21:28 |
fungi | probably a good idea to double-check, yeah, though mirror-update.o.o is already focal and does afs writes | 21:32 |
fungi | so the executors will probably be fine | 21:32 |
ianw | clarkb: let me see | 21:34 |
ianw | Creating archive at "ssh://borg-gitea01@backup02.ca-ymq-1.vexxhost.opendev.org/opt/backups/borg-gitea01/backup::gitea01-mysql-2021-02-24T05:53:50" | 21:36 |
ianw | mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `action` at row: 7057 | 21:36 |
ianw | that seems like a legitimate failure, why though ... | 21:36 |
clarkb | that should all be on localhost too | 21:38 |
clarkb | maybe mariadb updated at that same time? | 21:38 |
clarkb | we auto update it iirc | 21:38 |
*** slaweq has quit IRC | 21:48 | |
ianw | maybe ... | 21:57 |
ianw | for some reason gitea logs go to both /var/log/syslog and /var/log/containers/docker-gitea.log | 22:00 |
clarkb | ianw: the acces logs go to /var/gitea/logs/ too iirc | 22:00 |
clarkb | but not to the docker or syslog logs iirc | 22:00 |
ianw | the mariadb container is 41 hours old ... i don't think that lines up with the backup failure | 22:02 |
ianw | Aborted connection 33047 to db: 'gitea' user: 'root' host: 'localhost' (Got an error writing communication packets) | 22:03 |
ianw | whateer that means | 22:03 |
*** hamalq has joined #opendev | 22:11 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: install-docker: move rsyslog handler earlier https://review.opendev.org/c/opendev/system-config/+/777476 | 22:20 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: install-docker: remove fix from prior change https://review.opendev.org/c/opendev/system-config/+/777477 | 22:20 |
*** mlavalle has joined #opendev | 22:36 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: zuul-results-summary: set default branch https://review.opendev.org/c/openstack/project-config/+/777480 | 23:08 |
ianw | corvus: ^ i think that's maybe about right | 23:08 |
corvus | ianw: +2 lemme know how it goes :) | 23:09 |
ianw | i don't think i've used depends-on: for that project before ... so it's just been cloning as part of "required-projects" where it has the override-checkout to main | 23:09 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: zuul-results-summary: set default branch https://review.opendev.org/c/openstack/project-config/+/777480 | 23:23 |
corvus | ianw: wait is that the right tenant? | 23:36 |
corvus | ianw: i thought it was opendev tenant; is it openstack? | 23:36 |
fungi | which reminds me, i need to also work on the change to move git-review into the opendev tenant | 23:38 |
clarkb | we're incorporating that results summary plugin on the gerrit images which are built out of the openstack tenant | 23:40 |
clarkb | I think? | 23:40 |
ianw | corvus: yeah, it's running on the system-config jobs | 23:40 |
ianw | although ... https://opendev.org/openstack/project-config/src/branch/master/zuul/main.yaml#L64 | 23:42 |
*** stewie925 has joined #opendev | 23:43 | |
stewie925 | hello opendev support I have a question | 23:43 |
stewie925 | earlier I just attempted to re-associate my opendev account from work email to personal email | 23:44 |
stewie925 | however I noticed that my opendev commit is not pointing to my personal email address - could anyone help? | 23:45 |
clarkb | stewie925: you mean your git commits have the wrong email address in them? | 23:45 |
stewie925 | yes sir | 23:45 |
clarkb | that is configured on the client side with `git config` and git applies those settings when making commits. Something like `git config --global user.email 'foo@bar.com'` will likely fix it for you | 23:46 |
clarkb | note that old commits won't be updated (I think if you amend them after that update it may fix it though | 23:46 |
stewie925 | sorry if you allow me to share... | 23:47 |
stewie925 | I tried to do it using the Opendev Settings - adding an email address and they set it to 'Preferred' | 23:48 |
stewie925 | and I clicked on Send Verification - nothing happened (no email). I used gmail email address this first try | 23:50 |
clarkb | did you check your spam dir or similar? | 23:51 |
ianw | corvus: hrm, now i'm thinking it does belong in opendev. but also that we need to exclude it from system-required jobs | 23:52 |
stewie925 | on the second try, I added a yahoo email address in the Settings > Email Address . Then I clicked on Send Verification. This time I got an email from Ubuntu One. I followed that link and presto - I see my Opendev account switched to my Yahoo email from my work email. | 23:52 |
stewie925 | clark, I did check my spam folder in gmail, there was nothing... | 23:52 |
clarkb | stewie925: ubuntu one shouldn't be involved in that | 23:52 |
clarkb | the email addresses you tell gerrit to send alerts too are largely indpendent of ubuntu one (the single address associated with the openid is auto filled for you though) | 23:53 |
stewie925 | but anyway clarkb - when I looked at my commits - they're associated with my GMAIL account surprinsgly | 23:53 |
clarkb | stewie925: the commits should all be associated with whatever email is in the commit itself | 23:53 |
clarkb | stewie925: can you check if git log shows the gmail account in the commit? | 23:54 |
clarkb | gerrit shoudl reflect that if so | 23:54 |
stewie925 | the commits were originally associated with my work email (I left work a few months back) | 23:55 |
openstackgerrit | Ian Wienand proposed opendev/project-config master: zuul-results-summary: set default branch https://review.opendev.org/c/opendev/project-config/+/777485 | 23:55 |
stewie925 | but when I tried this thing - changing the preferred email thing - the commits now changed from. my old work email to my Gmail email (the failed first try) | 23:55 |
clarkb | stewie925: maybe you can be more specific about what you mean when you say "associated with my gmail account" | 23:55 |
clarkb | because gerrit won't rewrite commits like that | 23:56 |
clarkb | I think I'm not understanding | 23:56 |
stewie925 | when I try to filter my commits - I do owner:self | 23:56 |
clarkb | oh changes not commits | 23:57 |
clarkb | stewie925: did you change your email to yahoo in ubuntu one? | 23:57 |
stewie925 | oh sorry, changes | 23:57 |
clarkb | I guess that would explain why ubuntu one sent you a verification email | 23:57 |
stewie925 | ohhh hmmm | 23:57 |
stewie925 | well yeah in the meantime I sent them an email before I came online | 23:58 |
clarkb | if you did that, then there is a good chance ubuntu one created a new openid for you and when you logged in after that gerrit created a new account for the new openid | 23:58 |
clarkb | which could explain the split | 23:58 |
clarkb | (unfortunantely, we're in the middle of trying to fix ~650 preexisting issues of this sort so that we can more easily manage these problems in the future, but there is a lot to sort through and haven't quite made it there) | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!