fungi | there's some odd errors in the log | 00:00 |
---|---|---|
fungi | ahh, looks like it may be due to repos with no branch named "master" | 00:01 |
clarkb | oh right because hound only looks at master | 00:01 |
ianw | oh, they merged something about that | 00:01 |
clarkb | that makes sense | 00:01 |
fungi | couldn't find remote ref master | 00:01 |
fungi | ambiguous argument 'origin/master': unknown revision or path not in the working tree. | 00:01 |
ianw | it looks like it has a head branch detector | 00:03 |
ianw | it calls | 00:07 |
ianw | git remote show origin | 00:07 |
ianw | and | 00:07 |
ianw | var headBranchRegexp = regexp.MustCompile(`HEAD branch: (?P<branch>.+)`) | 00:07 |
ianw | /var/lib/hound/data/vcs-1904f5a1b65975a88e16b96f9ef2c83aa8cbb0c0# git remote show origin | 00:07 |
ianw | HEAD branch: main | 00:07 |
ianw | ... i.e. i should be finding it | 00:07 |
clarkb | huh | 00:07 |
ianw | that repo is cirros | 00:08 |
clarkb | ya frickler added it as a main branch test case. Everything worked well until now :) | 00:08 |
ianw | hang on, we may need to set it | 00:08 |
ianw | oh, it's jeepyb that outputs the config iirc | 00:10 |
ianw | yeah we need to set "detect-ref" in the config | 00:11 |
clarkb | ah | 00:12 |
opendevreview | Ian Wienand proposed opendev/jeepyb master: hound: add detect-ref config option https://review.opendev.org/c/opendev/jeepyb/+/830919 | 00:14 |
ianw | // Open an index at the given path. If the idxDir is already present, it will | 00:18 |
ianw | / simply open and use that index. If, however, the idxDir does not exist a new | 00:18 |
ianw | / one will be built. | 00:18 |
ianw | this suggests to me we could remove idx-* files, which might at least stop it having to reclone everything | 00:18 |
fungi | and just keep the vcs-* files, yeah | 00:20 |
ianw | it looks like we install from jeepyb master (git+https://opendev.org/opendev/jeepyb#egg=jeepyb) | 00:20 |
ianw | so if https://review.opendev.org/c/opendev/jeepyb/+/830919 is ok we just need to trigger a image rebuild | 00:21 |
Clark[m] | Looks fine to me. But I'm off the computer to start some early dinner prep | 00:23 |
opendevreview | Steve Baker proposed openstack/diskimage-builder master: Revert "Revert "Detect boot and EFI partitions in extract-image"" https://review.opendev.org/c/openstack/diskimage-builder/+/830900 | 01:52 |
opendevreview | Ian Wienand proposed opendev/jeepyb master: hound: add detect-ref config option https://review.opendev.org/c/opendev/jeepyb/+/830919 | 01:54 |
opendevreview | Ian Wienand proposed opendev/system-config master: hound: enable detect-ref https://review.opendev.org/c/opendev/system-config/+/830926 | 02:57 |
opendevreview | Ian Wienand proposed opendev/system-config master: hound: enable detect-ref https://review.opendev.org/c/opendev/system-config/+/830926 | 04:06 |
*** bhagyashris is now known as bhagyashris|ruck | 04:38 | |
*** pojadhav|out is now known as pojadhav | 05:26 | |
opendevreview | Ian Wienand proposed opendev/system-config master: hound: enable detect-ref https://review.opendev.org/c/opendev/system-config/+/830926 | 06:27 |
*** ysandeep|out is now known as ysandeep | 06:28 | |
*** amoralej|off is now known as amoralej | 07:11 | |
*** luigi is now known as luigi-training | 07:24 | |
*** luigi-training is now known as luigi | 07:24 | |
*** jpena|off is now known as jpena | 08:05 | |
*** bhagyashris_ is now known as bhagyashris|ruck | 09:55 | |
*** ysandeep is now known as ysandeep|afk | 10:02 | |
*** bhagyashris_ is now known as bhagyashris|ruck | 10:44 | |
*** rlandy|out is now known as rlandy|ruck | 10:48 | |
*** ysandeep|afk is now known as ysandeep | 12:00 | |
*** frenzyfriday|rover is now known as frenzyfriday | 12:49 | |
*** ysandeep is now known as ysandeep|away | 12:58 | |
*** amoralej is now known as amoralej|lunch | 13:21 | |
*** ysandeep|away is now known as ysandeep | 13:42 | |
*** Guest229 is now known as diablo_rojo_phone | 14:01 | |
*** amoralej|lunch is now known as amoralej | 14:03 | |
opendevreview | Florian Haas proposed zuul/zuul-jobs master: tox: Include ansible_env in tox_environment https://review.opendev.org/c/zuul/zuul-jobs/+/830992 | 14:15 |
corvus | i'd like to do a rolling restart of zuul | 14:28 |
corvus | starting that now | 14:30 |
fungi | thanks corvus, sounds fine to me | 14:37 |
opendevreview | Florian Haas proposed zuul/zuul-jobs master: tox: Include ansible_env in tox_environment https://review.opendev.org/c/zuul/zuul-jobs/+/830992 | 14:46 |
*** ysandeep is now known as ysandeep|out | 14:52 | |
*** dviroel is now known as dviroel|lunch | 14:57 | |
*** amoralej is now known as amoralej|off | 15:00 | |
*** ykarel is now known as ykarel|away | 15:22 | |
opendevreview | Florian Haas proposed zuul/zuul-jobs master: tox: Include ansible_env in tox_environment https://review.opendev.org/c/zuul/zuul-jobs/+/830992 | 15:41 |
opendevreview | Florian Haas proposed zuul/zuul-jobs master: tox: Include ansible_env in tox_environment https://review.opendev.org/c/zuul/zuul-jobs/+/830992 | 15:43 |
*** dviroel|lunch is now known as dviroel | 16:04 | |
BlaisePabon[m] | I'm new here and will lurk before making a fool of myself. | 16:18 |
BlaisePabon[m] | (I'm also using my "work" id, so I may return with my personal account...) | 16:19 |
clarkb | BlaisePabon[m]: welcome. And feel free to ask questions if there is anything we can help with | 16:19 |
clarkb | infra-root any reason to not approve https://review.opendev.org/c/opendev/system-config/+/830874 now? I'm not sure if we'd prefer to wait on the zuul rolling restart to compelte? though that may take hours? | 16:20 |
corvus | clarkb: i wouldn't wait for the restart; it will take ... all day? :) | 16:20 |
BlaisePabon[m] | Thank you Clark. TL;DR: I'm rebuilding my home lab and I want to look into collaborating with open Dev (I have a lot of compute in my garage). | 16:21 |
clarkb | corvus: cool I'll approve it then and if something comes up before it merges we can remove the +A | 16:21 |
corvus | is https://grafana.opendev.org/d/6c807ed8fd/nodepool?orgId=1 the best view of the nodepool status these days? i think that shows all our node use and the clouds we're using right now? | 16:25 |
corvus | Blaise Pabon: ^ might be interesting to see what's in use | 16:25 |
corvus | and i think https://docs.opendev.org/opendev/system-config/latest/contribute-cloud.html has an overview of contributing resources | 16:26 |
clarkb | corvus: I also like the nodepool views on the zuul dashboard in grafana | 16:27 |
corvus | this one: https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1 | 16:28 |
corvus | which i'm watching right now since ze01 is the straggler in the first batch of executor restarts | 16:28 |
corvus | there's always that one job... | 16:29 |
*** marios is now known as marios|out | 16:42 | |
*** jpena is now known as jpena|off | 16:56 | |
clarkb | once etherpad is done I think I may approve the zuul-registry fix and then recheck fungi's change | 16:59 |
fungi | thanks, i'm still in my morning meeting tunnel for the next hour (even though it's no longer morning here), but i can help monitor it once i'm done | 17:00 |
corvus | we're onto the second batch of executor restarts now | 17:08 |
corvus | this'll be the longer one (since all new jobs have started here) | 17:09 |
opendevreview | Merged opendev/system-config master: Update Etherpad to 1.8.17 https://review.opendev.org/c/opendev/system-config/+/830874 | 17:09 |
corvus | seems to be running jobs :) | 17:11 |
clarkb | etherpad service is restarting now | 17:28 |
clarkb | https://etherpad.opendev.org/p/isitbroken loads for me | 17:29 |
clarkb | lgtm | 17:30 |
corvus | i'm going to afk until later this afternoon. i think since the first batch of executors looks good, the ongoing rolling restart will be fine. it will just restart the second batch of executors and the mergers. it will not do scheduler or web; i will do that after i get back. | 17:36 |
clarkb | corvus: ok, any concerns with me approving that zuul-regstriy change while you are gone? | 17:37 |
clarkb | oh looks liek you said my plan sounds good in zuul room. I'll proceed soon then | 17:37 |
corvus | clarkb: ++ | 17:37 |
corvus | the rolling restart is just an ansible playbook i'm running on bridge... so if some emergency comes up, that's the thing to kill | 17:38 |
clarkb | noted | 17:38 |
corvus | root 20070 11.6 0.7 326572 59016 pts/4 Rl+ 14:30 21:54 /usr/bin/python3 /usr/local/bin/ansible-playbook -f 20 playbooks/zuul_rolling_restart.yaml | 17:39 |
corvus | for the record | 17:39 |
clarkb | zuul-registry update has been approved. I'll monitor it after it lands. Likely by rechecking funig's change | 18:05 |
fungi | looks like that merged, thanks! | 18:09 |
corvus | ah, evil count funig | 18:09 |
corvus | and his dastardly change | 18:10 |
fungi | clarkb: i don't know if deploy jobs are succeeding yet. did ianw's default branch fixes for jeepyb/hound merge yet? | 18:10 |
clarkb | fungi: the deploys should be fine without ianw's fix I think | 18:11 |
fungi | looks like no, they haven't been approved yet | 18:11 |
clarkb | but no the jeepyb update is failing testing | 18:11 |
fungi | oh! right, the deploy jobs will work because i cleaned up the rootfs | 18:11 |
fungi | sorry :/ | 18:11 |
fungi | i'm still trying to context-shift out of my day-o-meetins | 18:12 |
fungi | clarkb: the jeepyb change is passing | 18:12 |
clarkb | hrm not the one I had in my browser | 18:13 |
clarkb | maybe there is a different one | 18:13 |
fungi | there was an egregious abomination before the whitespace gods in patchset 1 | 18:13 |
fungi | 830919 | 18:13 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/830916 that one is the one I was looking at | 18:13 |
fungi | oh, aha | 18:13 |
clarkb | which if anything was going to affect deploy it would be ^ but I'm pretty sure deploy is fine | 18:14 |
fungi | thanks, i missed that one. i was looking at the two default branch fixes | 18:14 |
clarkb | since this was just silently not doing anything before and would only fail in the codesearch playbook which nothing should depend on | 18:14 |
fungi | right | 18:16 |
clarkb | fungi: have you checked if your apache config update has applied yet? | 18:16 |
clarkb | I approved the jeepyb change but I don't think it shares a queue with system-config so approval on the next change isn't helpful just yet | 18:18 |
fungi | clarkb: if you meant 829975 no it hasn't merged yet | 18:19 |
clarkb | I'm confused. What looked like it merged then? | 18:19 |
clarkb | I agree the change I was thinking of has not merged so can be rechecked soon once the zuul-registry change merges | 18:20 |
fungi | i was talking about the buildset registry fix for the thing 829975 was failing on, but misread some notifications from gerrit i think | 18:20 |
fungi | as that doesn't seem to have merged either yet | 18:20 |
clarkb | ah got it. | 18:20 |
clarkb | ya its in the gate and should land shortly. Then once the image for it publishes we can recheck 829975 | 18:20 |
fungi | yeah, sorry, i'm still a little scattered and trying to catch back up | 18:20 |
clarkb | I also want to get an update to the gitea 1.16.1 change to point it at 1.16.2 now | 18:21 |
fungi | oh, that got tagged? excellent | 18:21 |
clarkb | yup I dont think it has the fix we wanted but many other fixes seem to be included :) | 18:22 |
clarkb | https://github.com/go-gitea/gitea/blob/v1.16.2/CHANGELOG.md | 18:23 |
clarkb | I like to take my time with those updates and cross check file diffs and so on. So I' | 18:23 |
clarkb | *I'll want until I'm happy with zuul-registry and your change before looking at that | 18:23 |
clarkb | ok https://hub.docker.com/layers/zuul/zuul-registry/latest/images/sha256-f2a9be9b41c0dd2713ccbdc095d9215eaf32a0a4236684a49c5e19285cb9a34d?context=explore seems updated | 18:27 |
clarkb | and change has been rechecked | 18:27 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update Gitea to 1.16.2 https://review.opendev.org/c/opendev/system-config/+/828184 | 18:46 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM change to test and hold gitea 1.16.2 https://review.opendev.org/c/opendev/system-config/+/828586 | 18:46 |
clarkb | zuul's lack of executors is leading to slow job enqueues so I went ahead and did ^ | 18:47 |
clarkb | there is a new hold in place for that and I deleted the old hold as well as the hold for etherpad and gitea link verification | 18:47 |
opendevreview | Merged opendev/jeepyb master: hound: add detect-ref config option https://review.opendev.org/c/opendev/jeepyb/+/830919 | 19:03 |
clarkb | heh ^ rebuilds the gerrit image to inlcude teh jeepyb update, which isn't necessary in this case. But we should think about a gerrit restart on up to date images soon (maybe do the mergability checks and remove jvm gc logs too at the same time early next week?) | 19:06 |
fungi | yeah, all those together would be good for a restart | 19:17 |
fungi | or if we want to prioritize reviewing those today i can plan to do a quick restart over the weekend | 19:18 |
fungi | worst case, if something doesn't work, i'll roll back to a specific image and restart again with deployment to review disabled | 19:19 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/830912 is the jvm gc log removal change. Note you have to remove the files (or move them aside) from the log dir too otherwise the issue seems to persist | 19:20 |
fungi | huh, interesting, so remove them while ther container is down i guess | 19:20 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/829882 is the mergable check and has reviews. I would say prioritize this one anyway and the jvm gc log thing is more of a minor annoyance | 19:20 |
clarkb | fungi: ya the commit message indicates the process. You stop gerrit, move the files aside, remove the entry from docker-compose.yaml (though ansible probably already did this which is fine) then start gerrit again | 19:21 |
fungi | read it, very clear now, thanks | 19:26 |
fungi | lgtm, happy to approve it or wait for a second reviewer | 19:26 |
clarkb | I'm happy to do it on monday but if others want to do it sooner than that no objections from me either | 19:27 |
clarkb | zuul is pretty backed up right now so no rush due to that either :) | 19:27 |
clarkb | It should catch up once it starts the second half of executors that were stopped | 19:27 |
fungi | as i said, i'm available to do a restart over the weekend, i'm not too concerned about any of these additions making trouble, and if we get the gc logging change in then i'll make sure to do the dance recommended in its commit message as part of the restart | 19:28 |
clarkb | sounds good, thanks | 19:28 |
clarkb | the gitea image build for 1.16.2 jumped ahead of the gerrit change and is likely to exercise the buildset registry first | 19:30 |
clarkb | (it has a node and the gerrit jobs don't yet, but niether has an executor) | 19:30 |
fungi | executor roulette | 19:37 |
clarkb | https://zuul.opendev.org/t/openstack/stream/b3783abcb11749a98a772aaf18c69fa8?logfile=console.log it is running now | 19:41 |
clarkb | I think it is the end of that job where it writes to the buildset registry that we care about | 19:41 |
clarkb | then the system-config-run-gitea job will fetch that image and test it | 19:41 |
clarkb | I think it pushed to the buildset registry happily. This is a smaller image though so really just cheks that we didn't introduce a fatal regression and not necessarily that the issue is resolved | 19:48 |
clarkb | we are waiting on one more executor to finish up then we'll restart all 6 and be back to a full set | 19:51 |
clarkb | ze08 | 19:56 |
clarkb | lunch now but I think it is at least as functional as before | 19:58 |
*** dviroel is now known as dviroel|afk | 20:21 | |
opendevreview | Merged opendev/system-config master: Restore is:mergeable predicate in Gerrit https://review.opendev.org/c/opendev/system-config/+/829882 | 20:42 |
clarkb | I wonder if zuul could "reparent" paused jobs to speed up these restarts. There is no active ssh connection at that point. You'd basically stick the job into a "please deal with this when completed" queue? | 20:53 |
clarkb | tracking state for that might be complicated though since we attach builds firmly to a specific executor right now iirc | 20:54 |
clarkb | the gitea 1.16.2 builds were all successful. I expect zuul-registry is fine but still waiting on the gerrit image builds to hit it with the large layers | 21:08 |
spencerharmon | Hello! Just wanted to introduce myself. I'm new to opendev/gerrit. I just submitted a review for the first time. Hopefully I did everything correctly. I see there's another channel for octavia. In case I have questions about feedbak or the review process, would those be more appropriate in this channel or in #octavia? Also, review is https://review.opendev.org/c/openstack/octavia/+/831051 | 21:16 |
johnsom | spencerharmon Hi and welcome! | 21:19 |
fungi | spencerharmon: yep, in here is where we run the systems which make code review and other sorts of collaboration possible, but it'll be the octavia contributors reviewing your change so that channel is probably more appropriate if you have questions for the people who are going to be looking at it | 21:19 |
fungi | also, welcome! and have a great time | 21:19 |
spencerharmon | Awesome, thanks! :) | 21:20 |
johnsom | spencerharmon The Octavia team hangs out in #openstack-lbaas. We are more than happy to help you out with gerrit/patches/etc. | 21:20 |
spencerharmon | Oh, good to know. I'll join that one. | 21:20 |
spencerharmon | One thing I haven't come across in the documentation so far is the backport process. I'm running this commit against victoria in the lab for compatibility with a third party driver. If this is accepted, I'd like to apply it to victoria as well. Do any of yall have know where that process is documented or have any related advice? | 21:26 |
clarkb | spencerharmon: https://docs.openstack.org/project-team-guide/stable-branches.html#proposing-fixes is openstack backport procedure (with ltos of other backport related info in that doc) | 21:28 |
spencerharmon | Thank you so much! | 21:28 |
clarkb | spencerharmon: its basically cherry-pick -x from the master branch in order going backwards until things are not supported or no longer needed anymore | 21:29 |
clarkb | fungi: arg https://zuul.opendev.org/t/openstack/build/5f67ee2a2fcb4eec8529a80e726319e3/log/job-output.txt#7434 My hunch is that we converted the short size read to a 404 because the blob isn't fully on disk yet and docker doesn't retry | 21:31 |
clarkb | neither situation is correct and I think the failures would happen in either case. Basically if we failed before we'd fail now and vice versa so I don't think we need a quick revert. But we need to get the buildset registry logs and for that have to wait for the job to complete | 21:32 |
spencerharmon | Ah, gotcha. Looks like `git review <branch>` is the command I'll need, after cherry picking. Looks pretty easy. Thanks! | 21:32 |
clarkb | corvus: ^ fyi. Basically I think the error changed and is maybe easier to track down now, but we didn't fix the problem | 21:32 |
clarkb | hrm however those failures happened in iweb not ovh like before. So maybe this is worse than before (less sensitive to disk speed and more just fail like?) | 21:34 |
ianw | infra-root: if you get a chance to look at https://review.opendev.org/c/opendev/system-config/+/830784/1 and https://review.opendev.org/c/opendev/system-config/+/830785 that enables the log export for all jobs, and updates the documentation respectively | 21:35 |
ianw | the doc update is bigger, once i started it seemed to fall out better into a separate section. thoughts welcome of course :) | 21:35 |
ianw | on my monday i'll push through the glean/dib stuff | 21:36 |
clarkb | ianw: thanks | 21:37 |
clarkb | I'll take a look once I've gotten zuul-regsitry mostly sorted | 21:37 |
ianw | thanks for the reviews on the hound stuff, i think we can check back on that in a week and see if we can pinpoint what's growing | 21:38 |
clarkb | zuul's executors are all back to full strength now too | 21:40 |
ianw | clarkb: do you think pushing a big image in the test job might help? | 21:41 |
ianw | i started to look at putting mitmproxy in that job too, from my rough notes on how i set it up when i was debugging | 21:42 |
clarkb | ianw: I don't think there is a test job currently, but yes the problem seems specifically related to large layers like the one we get with gerrit | 21:42 |
clarkb | oh there is a system-config-run-registry job | 21:42 |
ianw | if you think that would help i can try to get that going | 21:42 |
clarkb | that doesn't run against zuul-registry changes, but I see what you mean. Ya that might help, maybe via depends on between zuul-registry and system-config to try and trip it | 21:42 |
ianw | yeah the build-image does do a functional test; https://opendev.org/zuul/zuul-registry/src/branch/master/.zuul.yaml#L24 | 21:43 |
clarkb | oh interesting. But ya I think the key to tripping this is a large layer. cna probably do that via RUN dd if=/dev/urandom of=/foo count=SOMENUMBER | 21:45 |
clarkb | and then try and push the result of that into the registry | 21:45 |
ianw | yeah not 100% sure where the test image comes from that it pushes there | 21:45 |
ianw | https://opendev.org/zuul/zuul-registry/src/commit/8f1f0705f79f9ac13c768aad5a604a51152b5b41/playbooks/functional-test/setup.yaml#L63 | 21:47 |
ianw | seems to call "buildah from scratch", so that seems like it would be a very empty base container | 21:48 |
clarkb | I do wonder if some other action we're doing is indicating to the client that it can do the HEAD to read the size back | 21:48 |
clarkb | but I read through the api spec a bit yesterday and what that may be isn't clear to me | 21:49 |
clarkb | but it is possible the real underlying bug here is that we're taking some action too early which causes the client to get ahead of itself (since it should know the put request isn't complete yet why is it doing a head?) | 21:49 |
ianw | yeah honestly the only way i figured things out was the mitmdump and correlating that against the api spec; even then i didn't figure it out, but it gave upstream enough info to help | 21:49 |
ianw | looks like you could maybe update that to "buildah from <something big>" just as a first step in pushing more data in the test? | 21:51 |
clarkb | ianw: ++ | 21:51 |
clarkb | could use our gerrit image even | 21:51 |
clarkb | I've just realized that the big ~200MB blob giving us trouble is pushed by both jobs at around the same time. I'm curating a log paste, but I think the next thing to check is a run that only does gerrit 3.5 | 23:32 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM testing zuul-registry via gerrit builds https://review.opendev.org/c/opendev/system-config/+/831064 | 23:54 |
clarkb | https://paste.opendev.org/show/bS1Udj5RpRm8dkcFaVac/ are the annotated logs | 23:56 |
clarkb | I've written down what I think are some potential rpoblems but I'm not sure the logs show those specific issues happening in these failures. It is enough to make me susicious though | 23:56 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM testing zuul-registry via gerrit builds https://review.opendev.org/c/opendev/system-config/+/831064 | 23:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!