clarkb | its definitely taking longer on the second pass | 00:03 |
---|---|---|
clarkb | image uploads have completed to both ovh regions and rax-dfw. I suspect we won't really have a good sense for how well this is working until tomorrow? As a reminder to myself need to check that prepare-workspace-git is happy again, then unpause all the image builds and request a noble image build | 01:43 |
clarkb | https://zuul.opendev.org/t/openstack/stream/b71fd32cda0a4807b17f836f7ea339b7?logfile=console.log I think that job is running on an updated jammy node and it appears to have gotten past prepare-workspace-git steps | 02:12 |
clarkb | I can't say with 100% certainty without sshing in and my keys have all expired out so I won't do that now. But so far indications remain good | 02:13 |
clarkb | https://zuul.opendev.org/t/openstack/build/deb867537ddb4423a77d63745d14fbc7/log/job-output.txt#49-50 pretty sure this one also ran successfully on jammy in rax-dfw on the new image | 02:22 |
clarkb | and with that I'll call it a day | 02:22 |
frickler | meh, now the cloud launcher run also fails with: Failed to download remote objects and refs: fatal: detected dubious ownership in repository at '/home/zuul/src/opendev.org/opendev/ansible-role-cloud-launcher/.git' | 08:40 |
frickler | so here the issue is the other way around, running as root and cloning a repo owned by zuul. I guess one could argue that the way we are using this is putting full trust into the zuul user and we could as well drop using it and just have root everywhere? | 08:52 |
frickler | in order to finally make some progress on this, I've created /root/.gitconfig with the safe definition of just the above repo and am running the launcher playbook manually now to see if there are any further rogue security group rules | 08:53 |
frickler | bridge01.opendev.org : ok=335 changed=2 unreachable=0 failed=0 skipped=816 rescued=0 ignored=0 | 09:09 |
frickler | the two changes were the new keypairs for the two inmotion tenants. I've now deleted the .gitconfig file again until there is consensus on how we want to proceed with this. note I'll be afk tomorrow to monday | 09:10 |
fungi | it sounds like a straightforward workaround to me | 11:29 |
fungi | i wonder if any of our other deployment jobs are going to run into the same problem | 11:30 |
fungi | and i guess we can't simply chown the repos on bridge because that would break zuul's ability to update them | 12:24 |
fungi | to be clear, what you added to .gitconfig declared /home/zuul/src/opendev.org/opendev as a safe directory? | 12:26 |
fungi | oh, never mind, you said "just the above repo" not "just above the repo" | 12:27 |
fungi | i need more coffee | 12:27 |
fungi | so you specifically set /home/zuul/src/opendev.org/opendev/ansible-role-cloud-launcher as safe i guess | 12:27 |
frickler | to be precise, I ran the command that the git error message helpfully contains: "git config --global --add safe.directory /home/zuul/src/opendev.org/opendev/ansible-role-cloud-launcher/.git" | 12:30 |
frickler | this is the full log for context for non roots https://paste.opendev.org/show/b8s8YpCpBoLhYsW3fO75/ | 12:33 |
frickler | for other git errors, opensearch still finds a lot of hits on focal-arm, I guess I can just trigger a rebuild of that image | 12:45 |
frickler | some others are osa related, but iiuc jrosser has been working on that, might still be lacking backports | 12:46 |
frickler | seems like arm builds weren't paused? I also think unpausing other builds should be fine by now? | 12:49 |
frickler | the arm build will have to wait until cs9 is done, though | 12:52 |
Clark[m] | frickler: yes I think we can unpause builds if we're happy with the jammy results. I think we need to trigger a noble build when we do so. I just wasn't able to fully confirm jammy was working properly last night. | 13:34 |
Clark[m] | fungi: you cannot mark a top level dir for as safe you have to be explicit. This is why I decided to go with the chown route on image builds | 13:35 |
Clark[m] | And I think most bridge deployment stuff won't be affected becase we don't clone stuff? To be determined though and that is why I was concerned it would impact the Gerrit upgrade due to firefighting | 13:36 |
mnaser | is `JavaScript promise rejection: null is not an object (evaluating 'chrome.runtime.getURL("").replace'). Open browser console to see more details.` at the top of opened pages something that's known? | 13:36 |
Clark[m] | mnaser: not that I'm aware of. What service is this? | 13:37 |
mnaser | opendev's headers | 13:37 |
mnaser | https://usercontent.irccloud-cdn.com/file/Dz1RnM8X/image.png | 13:37 |
Clark[m] | For https://opendev.org? | 13:37 |
mnaser | yep | 13:38 |
Clark[m] | No I haven't seen that and cannot currently reproduce on my mobile browser | 13:40 |
mnaser | hmm, safari private window doesn't do it | 13:41 |
Clark[m] | mnaser: https://github.com/go-gitea/gitea/issues/29177 | 13:41 |
mnaser | let me try clearing cookies in oepndev/etc | 13:41 |
Clark[m] | It's likely one of your plugins seems they identified bitwarden in that issue | 13:41 |
mnaser | it must be 1password | 13:42 |
mnaser | that's the only one enabled | 13:42 |
mnaser | welp, disabled and enabled it and now the error is gone | 13:43 |
frickler | I guess I never noticed how slow arm image builds are. 4.5h seems like a lot of time, mostly doing git updates | 13:45 |
Clark[m] | I wonder if we can set the safe directory flag on specific clone commands with some flag. That way it's a bit more clear where and why we are trusting things | 13:54 |
Clark[m] | Rather than a set and forget in global git config that is less traceable | 13:55 |
corvus | you mean, in prepare-workspace-git? | 14:03 |
corvus | oh, i think you mean, rather than running "git config ..." then "git clone" inside of prepare-workspace-git, you mean something like "git --safe-directory clone" inside prepare-workspace-git | 14:05 |
frickler | I assumed that to refer to the workaround I did this morning, different type of issue than prepare-workspace-git | 14:18 |
frickler | the latter I understood should be resolved by the ownership change of the cache | 14:19 |
corvus | ah that makes sense too | 14:22 |
corvus | another option (i don't know if we discussed this) would be to add each path to a system-wide /etc/gitconfig at the time we build the images | 14:23 |
fungi | yeah, i think if we only have a handful of jobs that want to run as root cloning from zuul-pushed repositories on bridge, we could use a fairly targeted workaround for just those | 14:23 |
corvus | at least, i assume that's an option; needs testing first | 14:24 |
clarkb | oh sorry yup I was referring to the workaround frickler made | 14:31 |
opendevreview | Merged openstack/project-config master: Revert "Temporary update IPE ACL" https://review.opendev.org/c/openstack/project-config/+/904012 | 14:31 |
clarkb | for the few cases where we want ot explicitly clone like that having the override in the command itself would be nice | 14:31 |
clarkb | also note that frickler's override was on bridge and has nothing to do with the test images | 14:32 |
clarkb | I'm reasonably happy with the chown for the test images so far | 14:32 |
corvus | wfm | 14:45 |
fungi | yeah, deploy jobs generally seem to be fine, for example the infra-prod-manage-projects that ran for the change above was successful | 15:06 |
clarkb | I ssh'd into a node running an ironic tempest job that is past the git repo setup and confirmed it has zuul:zuul ownership | 15:06 |
clarkb | I'm going to unpause builds now starting with noble as I want to request a noble build | 15:07 |
fungi | sounds good | 15:09 |
clarkb | and looking back at my notes gentoo images have long been paused so they won't get unpaused. Everything else will be (and is in progress now) | 15:10 |
clarkb | and that should be done now | 15:11 |
clarkb | tonyb: frickler: I think the next step in the key rotation stuff is double checking the cloud launcher actually udpated keys in a cloud or three then we update the nodepool config. Not sure if I had a change pushed for nodepool yet but I'll check | 15:12 |
clarkb | doesn't look like there is a change for that yet | 15:12 |
tonyb | I know most of puppet stuff is disabled but it looks like it used to run on the wiki server. I know very little about puppet, how can I determine which puppet modules used to run against/on that server? | 15:14 |
clarkb | tonyb: system-config/manifests/site.pp is the top level entrance point for puppet. However I think wiki was never really puppeted so won't be in there | 15:15 |
tonyb | I'm specifically looking for whatever is generating the sitemap daily and other things that may be needed | 15:15 |
opendevreview | Clark Boylan proposed opendev/gerritlib master: DNM explicitly test jeepyb + gerritlib against gerrit 3.9 https://review.opendev.org/c/opendev/gerritlib/+/920837 | 15:17 |
tonyb | clarkb: Thanks. I see a puppet-mediawiki module and that's some help | 15:18 |
clarkb | tonyb: I don't think any of that was puppeted in a merged state. But there may be open changes that exposed some of that | 15:18 |
tonyb | clarkb: Thanks | 15:18 |
clarkb | looks like the change to update gerrit to 3.9 properly in our ansible config failed CI. I've rechecked it. I'm beginning to think that tomorrow morning we should try and take stock of where things are at and then decide if we proceed. I think things are looking good today though so probably will be fine | 15:21 |
tonyb | clarkb: Sounds like a fair plan. | 15:22 |
tonyb | clarkb, frickler: Thank you both for keeping the key-rotation process rolling. | 15:23 |
fungi | tonyb: we did at one time briefly have puppet managing users and iptables on the server but aside from that any puppetry you may have found for mediawiki and related bits was a work in progress that only got deployed to wiki-dev test deployments | 15:25 |
opendevreview | Clark Boylan proposed opendev/gerritlib master: DNM explicitly test jeepyb + gerritlib against gerrit 3.9 https://review.opendev.org/c/opendev/gerritlib/+/920837 | 15:27 |
clarkb | jeepyb looks good against gerrit 3.9 thankfull. As soon as the builds for that ps finish and report to gerrit to record that fact I'm going to push a new ps that generally updates testing for gerritlib while I'm looking at this stuff | 15:42 |
opendevreview | Clark Boylan proposed opendev/gerritlib master: Fixup gerritlib jeepyb integration testing https://review.opendev.org/c/opendev/gerritlib/+/920837 | 15:48 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Add not and tox py312 jobs https://review.opendev.org/c/zuul/zuul-jobs/+/920841 | 15:55 |
opendevreview | Clark Boylan proposed opendev/gerritlib master: Fixup gerritlib jeepyb integration testing https://review.opendev.org/c/opendev/gerritlib/+/920837 | 15:55 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Add nox-py312 job https://review.opendev.org/c/zuul/zuul-jobs/+/920842 | 15:59 |
tonyb | fungi: Okay thanks. | 16:00 |
clarkb | fungi: ha we raced for that one :) | 16:00 |
tonyb | fungi: you ad clarkb are clearly on the same page ;P | 16:00 |
fungi | i've got the git-review change about ready to push too | 16:04 |
fungi | i'll abandon 920842 and depends-on 920841 instead | 16:05 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests https://review.opendev.org/c/opendev/git-review/+/920845 | 17:00 |
frickler | looks like a noble rebuild is needed, will pull the trigger | 17:25 |
frickler | ah, already in progress | 17:25 |
frickler | focal-arm64 will take another 2.5h | 17:26 |
clarkb | ya I triggered a noble build as soon as I unpaused it since I was fairly certain it was already affected like jammy but just not having the same impact due to lack of use | 18:05 |
fungi | okay, good news on the git-review front at least | 18:05 |
clarkb | that build appears to be uploading now so should be corrected soon | 18:05 |
fungi | looks like the job failures are for a couple of different reasons, but none are behavior changes in gerrit itself | 18:06 |
fungi | the java on ubuntu-focal nodes is too old to start gerrit 3.9 which is why the py36 job fails, and the git on noble has changed its output when you create branch tracking checkouts which is causing a few tests to fail on a string mismatch | 18:08 |
clarkb | that is reassuring | 18:08 |
fungi | i think probably we don't need to test the full matrix of old/new python and old/new gerrit, so can resolve the java requirement by dropping a couple of jobs | 18:08 |
fungi | and then just make the test regexes loose enough to also work with noble's git | 18:09 |
clarkb | ya and I think if we find out later that people can't use latest git-review against odl gerrit we can suggest they downgrade | 18:15 |
clarkb | fungi: another thing worth mentioning is that I believe gerrit 3.10 included updates to the commit message hook for https://gerrit-review.googlesource.com/c/gerrit/+/394841 | 18:16 |
clarkb | I actually could make use of that but I already do squash commits without any special handling so its not a big deal | 18:16 |
clarkb | I basically git commit -a -m "squash" && git rebase -i HEAD~x and move the squash commit up in the stack, squash it where I want it to go and delete the new change-id in the process | 18:16 |
clarkb | I think the noble images have all been updated. I'll recheck the two changes I pushed earlier | 18:56 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Add nox and tox py312 jobs https://review.opendev.org/c/zuul/zuul-jobs/+/920841 | 18:57 |
*** clarkb is now known as Guest8089 | 19:49 | |
*** elodilles is now known as elodilles_ooo | 20:04 | |
*** Guest8089 is now known as clarkb | 20:04 | |
clarkb | go to lunch and have the irc connection die on you | 20:04 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests https://review.opendev.org/c/opendev/git-review/+/920845 | 20:11 |
fungi | sometimes irc clients need lunch too | 20:11 |
clarkb | fungi: before you call it a week, it sounds like you're currently comfortable with tonyb and I moving forward with the gerrit upgrade tomorrow? | 20:14 |
fungi | absolutely! | 20:15 |
clarkb | I feel underprepared but I think that is just due to the fires that have cropped up acting like distractions | 20:15 |
clarkb | I've gone through the upgrade process several times on held nodes at this point so should be straightforward | 20:15 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests https://review.opendev.org/c/opendev/git-review/+/920845 | 20:16 |
fungi | yeah, the plan laid out in the etherpad looks good, and the thorough upgrade and downgrade testing suggests there's not much to worry about | 20:17 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Add not and tox py312 jobs https://review.opendev.org/c/zuul/zuul-jobs/+/920841 | 20:26 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Update ansible versions used in unittesting https://review.opendev.org/c/zuul/zuul-jobs/+/920857 | 20:26 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Update ansible versions used in unittesting https://review.opendev.org/c/zuul/zuul-jobs/+/920857 | 20:39 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Add nox and tox py312 jobs https://review.opendev.org/c/zuul/zuul-jobs/+/920841 | 20:39 |
*** clarkb is now known as Guest8094 | 21:15 | |
Guest8094 | again? | 21:16 |
fungi | again! | 21:16 |
Guest8094 | is it affecting others or largely just me? | 21:17 |
fungi | you and one other user timed out at the same moment | 21:17 |
fungi | no, i take that back, it was 26 minutes apart, so just you | 21:17 |
fungi | it's been a ping timeout for your connection both times | 21:17 |
fungi | also seems like your client may be auto-joining before it identifies you to nickserv | 21:18 |
Guest8094 | yes, I don't have sasl set up with oftc beacuse its a pain | 21:18 |
fungi | i don't think oftc has sasl support | 21:18 |
Guest8094 | ah well that explains it | 21:18 |
fungi | some clients support delaying channel autojoins until you hear confirmation from nickserv | 21:19 |
fungi | oftc does support cert-based connection auth, but yes its support varies a bit by client | 21:19 |
*** Guest8094 is now known as clarkb | 21:19 | |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests https://review.opendev.org/c/opendev/git-review/+/920845 | 23:08 |
clarkb | somehow I'm still connected to irc | 23:19 |
clarkb | fungi: I guess verifying git output isn't super useful in a git review test? | 23:19 |
fungi | especially verifying that git checkout did what we asked it to do | 23:19 |
fungi | it wasn't a case of confirming that a git-review invocation of git did the right thing, the test was literally just checking behind a literal git checkout subprocess | 23:20 |
fungi | and i think we have enough trust that git will do what we ask it during test setup that if it doesn't then we have bigger issues than just git-review | 23:21 |
clarkb | ++ | 23:22 |
fungi | but mainly, the response it gives to that command on stdout changes so drastically, we'd basically end up checking against multiple possible strings, it's not just a minor difference we can wildcard in one or two spots | 23:24 |
fungi | and some of the existing wildcards in the old regex indicate it's changed more than once over the lifetime of that test | 23:25 |
clarkb | ya seems reasonable to trust git | 23:25 |
clarkb | I think I'm going to pop out a little early today. Tomorrow I'll try to get an early start to ensure verything is ready to go for gerrit upgrading | 23:27 |
fungi | sounds good. i'm around for a bit still, but splitting attention with packing | 23:28 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests https://review.opendev.org/c/opendev/git-review/+/920845 | 23:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!