Thursday, 2024-05-30

clarkbits definitely taking longer on the second pass00:03
clarkbimage uploads have completed to both ovh regions and rax-dfw. I suspect we won't really have a good sense for how well this is working until tomorrow? As a reminder to myself need to check that prepare-workspace-git is happy again, then unpause all the image builds and request a noble image build01:43
clarkbhttps://zuul.opendev.org/t/openstack/stream/b71fd32cda0a4807b17f836f7ea339b7?logfile=console.log I think that job is running on an updated jammy node and it appears to have gotten past prepare-workspace-git steps02:12
clarkbI can't say with 100% certainty without sshing in and my keys have all expired out so I won't do that now. But so far indications remain good02:13
clarkbhttps://zuul.opendev.org/t/openstack/build/deb867537ddb4423a77d63745d14fbc7/log/job-output.txt#49-50 pretty sure this one also ran successfully on jammy in rax-dfw on the new image02:22
clarkband with that I'll call it a day02:22
fricklermeh, now the cloud launcher run also fails with: Failed to download remote objects and refs:  fatal: detected dubious ownership in repository at '/home/zuul/src/opendev.org/opendev/ansible-role-cloud-launcher/.git'08:40
fricklerso here the issue is the other way around, running as root and cloning a repo owned by zuul. I guess one could argue that the way we are using this is putting full trust into the zuul user and we could as well drop using it and just have root everywhere?08:52
fricklerin order to finally make some progress on this, I've created /root/.gitconfig with the safe definition of just the above repo and am running the launcher playbook manually now to see if there are any further rogue security group rules08:53
fricklerbridge01.opendev.org       : ok=335  changed=2    unreachable=0    failed=0    skipped=816  rescued=0    ignored=0   09:09
fricklerthe two changes were the new keypairs for the two inmotion tenants. I've now deleted the .gitconfig file again until there is consensus on how we want to proceed with this. note I'll be afk tomorrow to monday09:10
fungiit sounds like a straightforward workaround to me11:29
fungii wonder if any of our other deployment jobs are going to run into the same problem11:30
fungiand i guess we can't simply chown the repos on bridge because that would break zuul's ability to update them12:24
fungito be clear, what you added to .gitconfig declared /home/zuul/src/opendev.org/opendev as a safe directory?12:26
fungioh, never mind, you said "just the above repo" not "just above the repo"12:27
fungii need more coffee12:27
fungiso you specifically set /home/zuul/src/opendev.org/opendev/ansible-role-cloud-launcher as safe i guess12:27
fricklerto be precise, I ran the command that the git error message helpfully contains: "git config --global --add safe.directory /home/zuul/src/opendev.org/opendev/ansible-role-cloud-launcher/.git"12:30
fricklerthis is the full log for context for non roots https://paste.opendev.org/show/b8s8YpCpBoLhYsW3fO75/12:33
fricklerfor other git errors, opensearch still finds a lot of hits on focal-arm, I guess I can just trigger a rebuild of that image12:45
fricklersome others are osa related, but iiuc jrosser has been working on that, might still be lacking backports12:46
fricklerseems like arm builds weren't paused? I also think unpausing other builds should be fine by now?12:49
fricklerthe arm build will have to wait until cs9 is done, though12:52
Clark[m]frickler: yes I think we can unpause builds if we're happy with the jammy results. I think we need to trigger a noble build when we do so. I just wasn't able to fully confirm jammy was working properly last night.13:34
Clark[m]fungi: you cannot mark a top level dir for as safe you have to be explicit. This is why I decided to go with the chown route on image builds13:35
Clark[m]And I think most bridge deployment stuff won't be affected becase we don't clone stuff? To be determined though and that is why I was concerned it would impact the Gerrit upgrade due to firefighting 13:36
mnaseris `JavaScript promise rejection: null is not an object (evaluating 'chrome.runtime.getURL("").replace'). Open browser console to see more details.` at the top of opened pages something that's known?13:36
Clark[m]mnaser: not that I'm aware of. What service is this?13:37
mnaseropendev's headers13:37
mnaserhttps://usercontent.irccloud-cdn.com/file/Dz1RnM8X/image.png13:37
Clark[m]For https://opendev.org?13:37
mnaseryep13:38
Clark[m]No I haven't seen that and cannot currently reproduce on my mobile browser13:40
mnaserhmm, safari private window doesn't do it13:41
Clark[m]mnaser: https://github.com/go-gitea/gitea/issues/2917713:41
mnaserlet me try clearing cookies in oepndev/etc13:41
Clark[m]It's likely one of your plugins seems they identified bitwarden in that issue13:41
mnaserit must be 1password13:42
mnaserthat's the only one enabled13:42
mnaserwelp, disabled and enabled it and now the error is gone13:43
fricklerI guess I never noticed how slow arm image builds are. 4.5h seems like a lot of time, mostly doing git updates13:45
Clark[m]I wonder if we can set the safe directory flag on specific clone commands with some flag. That way it's a bit more clear where and why we are trusting things13:54
Clark[m]Rather than a set and forget in global git config that is less traceable13:55
corvusyou mean, in prepare-workspace-git?14:03
corvusoh, i think you mean, rather than running "git config ..." then "git clone" inside of prepare-workspace-git, you mean something like  "git --safe-directory clone" inside prepare-workspace-git14:05
fricklerI assumed that to refer to the workaround I did this morning, different type of issue than prepare-workspace-git14:18
fricklerthe latter I understood should be resolved by the ownership change of the cache14:19
corvusah that makes sense too14:22
corvusanother option (i don't know if we discussed this) would be to add each path to a system-wide /etc/gitconfig at the time we build the images14:23
fungiyeah, i think if we only have a handful of jobs that want to run as root cloning from zuul-pushed repositories on bridge, we could use a fairly targeted workaround for just those14:23
corvusat least, i assume that's an option; needs testing first14:24
clarkboh sorry yup I was referring to the workaround frickler made14:31
opendevreviewMerged openstack/project-config master: Revert "Temporary update IPE ACL"  https://review.opendev.org/c/openstack/project-config/+/90401214:31
clarkbfor the few cases where we want ot explicitly clone like that having the override in the command itself would be nice14:31
clarkbalso note that frickler's override was on bridge and has nothing to do with the test images14:32
clarkbI'm reasonably happy with the chown for the test images so far14:32
corvuswfm14:45
fungiyeah, deploy jobs generally seem to be fine, for example the infra-prod-manage-projects that ran for the change above was successful15:06
clarkbI ssh'd into a node running an ironic tempest job that is past the git repo setup and confirmed it has zuul:zuul ownership15:06
clarkbI'm going to unpause builds now starting with noble as I want to request a noble build15:07
fungisounds good15:09
clarkband looking back at my notes gentoo images have long been paused so they won't get unpaused. Everything else will be (and is in progress now)15:10
clarkband that should be done now15:11
clarkbtonyb: frickler: I think the next step in the key rotation stuff is double checking the cloud launcher actually udpated keys in a cloud or three then we update the nodepool config. Not sure if I had a change pushed for nodepool yet but I'll check15:12
clarkbdoesn't look like there is a change for that yet15:12
tonybI know most of puppet stuff is disabled but it looks like it used to run on the wiki server.   I know very little about puppet, how can I determine which puppet modules used to run against/on that server?15:14
clarkbtonyb: system-config/manifests/site.pp is the top level entrance point for puppet. However I think wiki was never really puppeted so won't be in there15:15
tonybI'm specifically looking for whatever is generating the sitemap daily and other things that may be needed15:15
opendevreviewClark Boylan proposed opendev/gerritlib master: DNM explicitly test jeepyb + gerritlib against gerrit 3.9  https://review.opendev.org/c/opendev/gerritlib/+/92083715:17
tonybclarkb: Thanks.  I see a puppet-mediawiki module and that's some help15:18
clarkbtonyb: I don't think any of that was puppeted in a merged state. But there may be open changes that exposed some of that15:18
tonybclarkb: Thanks15:18
clarkblooks like the change to update gerrit to 3.9 properly in our ansible config failed CI. I've rechecked it. I'm beginning to think that tomorrow morning we should try and take stock of where things are at and then decide if we proceed. I think things are looking good today though so probably will be fine15:21
tonybclarkb: Sounds like a fair plan.15:22
tonybclarkb, frickler: Thank you both for keeping the key-rotation process rolling.15:23
fungitonyb: we did at one time briefly have puppet managing users and iptables on the server but aside from that any puppetry you may have found for mediawiki and related bits was a work in progress that only got deployed to wiki-dev test deployments15:25
opendevreviewClark Boylan proposed opendev/gerritlib master: DNM explicitly test jeepyb + gerritlib against gerrit 3.9  https://review.opendev.org/c/opendev/gerritlib/+/92083715:27
clarkbjeepyb looks good against gerrit 3.9 thankfull. As soon as the builds for that ps finish and report to gerrit to record that fact I'm going to push a new ps that generally updates testing for gerritlib while I'm looking at this stuff15:42
opendevreviewClark Boylan proposed opendev/gerritlib master: Fixup gerritlib jeepyb integration testing  https://review.opendev.org/c/opendev/gerritlib/+/92083715:48
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Add not and tox py312 jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/92084115:55
opendevreviewClark Boylan proposed opendev/gerritlib master: Fixup gerritlib jeepyb integration testing  https://review.opendev.org/c/opendev/gerritlib/+/92083715:55
opendevreviewJeremy Stanley proposed zuul/zuul-jobs master: Add nox-py312 job  https://review.opendev.org/c/zuul/zuul-jobs/+/92084215:59
tonybfungi: Okay thanks. 16:00
clarkbfungi: ha we raced for that one :)16:00
tonybfungi: you ad clarkb are clearly on the same page ;P16:00
fungii've got the git-review change about ready to push too16:04
fungii'll abandon 920842 and depends-on 920841 instead16:05
opendevreviewJeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests  https://review.opendev.org/c/opendev/git-review/+/92084517:00
fricklerlooks like a noble rebuild is needed, will pull the trigger17:25
fricklerah, already in progress17:25
fricklerfocal-arm64 will take another 2.5h17:26
clarkbya I triggered a noble build as soon as I unpaused it since I was fairly certain it was already affected like jammy but just not having the same impact due to lack of use18:05
fungiokay, good news on the git-review front at least18:05
clarkbthat build appears to be uploading now so should be corrected soon18:05
fungilooks like the job failures are for a couple of different reasons, but none are behavior changes in gerrit itself18:06
fungithe java on ubuntu-focal nodes is too old to start gerrit 3.9 which is why the py36 job fails, and the git on noble has changed its output when you create branch tracking checkouts which is causing a few tests to fail on a string mismatch18:08
clarkbthat is reassuring18:08
fungii think probably we don't need to test the full matrix of old/new python and old/new gerrit, so can resolve the java requirement by dropping a couple of jobs18:08
fungiand then just make the test regexes loose enough to also work with noble's git18:09
clarkbya and I think if we find out later that people can't use latest git-review against odl gerrit we can suggest they downgrade18:15
clarkbfungi: another thing worth mentioning is that I believe gerrit 3.10 included updates to the commit message hook for https://gerrit-review.googlesource.com/c/gerrit/+/39484118:16
clarkbI actually could make use of that but I already do squash commits without any special handling so its not a big deal18:16
clarkbI basically git commit -a -m "squash" && git rebase -i HEAD~x and move the squash commit up in the stack, squash it where I want it to go and delete the new change-id in the process18:16
clarkbI think the noble images have all been updated. I'll recheck the two changes I pushed earlier18:56
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Add nox and tox py312 jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/92084118:57
*** clarkb is now known as Guest808919:49
*** elodilles is now known as elodilles_ooo20:04
*** Guest8089 is now known as clarkb20:04
clarkbgo to lunch and have the irc connection die on you20:04
opendevreviewJeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests  https://review.opendev.org/c/opendev/git-review/+/92084520:11
fungisometimes irc clients need lunch too20:11
clarkbfungi: before you call it a week, it sounds like you're currently comfortable with tonyb and I moving forward with the gerrit upgrade tomorrow?20:14
fungiabsolutely!20:15
clarkbI feel underprepared but I think that is just due to the fires that have cropped up acting like distractions20:15
clarkbI've gone through the upgrade process several times on held nodes at this point so should be straightforward20:15
opendevreviewJeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests  https://review.opendev.org/c/opendev/git-review/+/92084520:16
fungiyeah, the plan laid out in the etherpad looks good, and the thorough upgrade and downgrade testing suggests there's not much to worry about20:17
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Add not and tox py312 jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/92084120:26
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Update ansible versions used in unittesting  https://review.opendev.org/c/zuul/zuul-jobs/+/92085720:26
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Update ansible versions used in unittesting  https://review.opendev.org/c/zuul/zuul-jobs/+/92085720:39
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Add nox and tox py312 jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/92084120:39
*** clarkb is now known as Guest809421:15
Guest8094again?21:16
fungiagain!21:16
Guest8094is it affecting others or largely just me?21:17
fungiyou and one other user timed out at the same moment21:17
fungino, i take that back, it was 26 minutes apart, so just you21:17
fungiit's been a ping timeout for your connection both times21:17
fungialso seems like your client may be auto-joining before it identifies you to nickserv21:18
Guest8094yes, I don't have sasl set up with oftc beacuse its a pain21:18
fungii don't think oftc has sasl support21:18
Guest8094ah well that explains it21:18
fungisome clients support delaying channel autojoins until you hear confirmation from nickserv21:19
fungioftc does support cert-based connection auth, but yes its support varies a bit by client21:19
*** Guest8094 is now known as clarkb21:19
opendevreviewJeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests  https://review.opendev.org/c/opendev/git-review/+/92084523:08
clarkbsomehow I'm still connected to irc23:19
clarkbfungi: I guess verifying git output isn't super useful in a git review test?23:19
fungiespecially verifying that git checkout did what we asked it to do23:19
fungiit wasn't a case of confirming that a git-review invocation of git did the right thing, the test was literally just checking behind a literal git checkout subprocess23:20
fungiand i think we have enough trust that git will do what we ask it during test setup that if it doesn't then we have bigger issues than just git-review23:21
clarkb++23:22
fungibut mainly, the response it gives to that command on stdout changes so drastically, we'd basically end up checking against multiple possible strings, it's not just a minor difference we can wildcard in one or two spots23:24
fungiand some of the existing wildcards in the old regex indicate it's changed more than once over the lifetime of that test23:25
clarkbya seems reasonable to trust git23:25
clarkbI think I'm going to pop out a little early today. Tomorrow I'll try to get an early start to ensure verything is ready to go for gerrit upgrading23:27
fungisounds good. i'm around for a bit still, but splitting attention with packing23:28
opendevreviewJeremy Stanley proposed opendev/git-review master: Update the upper bound for Python and Gerrit tests  https://review.opendev.org/c/opendev/git-review/+/92084523:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!