*** dviroel|out is now known as dviroel | 01:30 | |
*** rlandy|ruck|bbl is now known as rlandy|ruck | 01:50 | |
*** rlandy|ruck is now known as rlandy|out | 01:59 | |
*** dviroel is now known as dviroel|out | 02:02 | |
opendevreview | Merged opendev/puppet-apparmor master: Retire this repo https://review.opendev.org/c/opendev/puppet-apparmor/+/829759 | 02:44 |
---|---|---|
opendevreview | Merged opendev/puppet-askbot master: Retire this repo https://review.opendev.org/c/opendev/puppet-askbot/+/829760 | 02:44 |
opendevreview | Merged opendev/puppet-asterisk master: Retire this repo https://review.opendev.org/c/opendev/puppet-asterisk/+/829761 | 02:45 |
opendevreview | Merged opendev/puppet-bandersnatch master: Retire this repo https://review.opendev.org/c/opendev/puppet-bandersnatch/+/829762 | 02:46 |
opendevreview | Merged opendev/puppet-bugdaystats master: Retire this repo https://review.opendev.org/c/opendev/puppet-bugdaystats/+/829763 | 02:48 |
opendevreview | Merged opendev/puppet-ciwatch master: Retire this repo https://review.opendev.org/c/opendev/puppet-ciwatch/+/829764 | 02:51 |
opendevreview | Merged opendev/puppet-diskimage_builder master: Retire this repo https://review.opendev.org/c/opendev/puppet-diskimage_builder/+/829766 | 02:54 |
opendevreview | Merged opendev/puppet-germqtt master: Retire this repo https://review.opendev.org/c/opendev/puppet-germqtt/+/829767 | 02:54 |
opendevreview | Merged opendev/puppet-grafyaml master: Retire this repo https://review.opendev.org/c/opendev/puppet-grafyaml/+/829768 | 02:54 |
opendevreview | Merged opendev/puppet-graphite master: Retire this repo https://review.opendev.org/c/opendev/puppet-graphite/+/829769 | 02:55 |
opendevreview | Merged opendev/puppet-haveged master: Retire this repo https://review.opendev.org/c/opendev/puppet-haveged/+/829790 | 02:55 |
opendevreview | Merged opendev/puppet-hound master: Retire this repo https://review.opendev.org/c/opendev/puppet-hound/+/829791 | 02:56 |
opendevreview | Merged opendev/puppet-infra-cookiecutter master: Retire this repo https://review.opendev.org/c/opendev/puppet-infra-cookiecutter/+/829793 | 02:56 |
opendevreview | Merged opendev/puppet-jenkins master: Retire this repo https://review.opendev.org/c/opendev/puppet-jenkins/+/829794 | 02:56 |
opendevreview | Merged opendev/puppet-kerberos master: Retire this repo https://review.opendev.org/c/opendev/puppet-kerberos/+/829795 | 02:57 |
opendevreview | Merged opendev/puppet-lodgeit master: Retire this repo https://review.opendev.org/c/opendev/puppet-lodgeit/+/829796 | 02:57 |
opendevreview | Merged opendev/puppet-lpmqtt master: Retire this repo https://review.opendev.org/c/opendev/puppet-lpmqtt/+/829798 | 02:58 |
opendevreview | Merged opendev/puppet-mailman master: Retire this repo https://review.opendev.org/c/opendev/puppet-mailman/+/829799 | 02:58 |
opendevreview | Merged opendev/puppet-mediawiki master: Retire this repo https://review.opendev.org/c/opendev/puppet-mediawiki/+/829800 | 02:59 |
opendevreview | Merged opendev/puppet-meetbot master: Retire this repo https://review.opendev.org/c/opendev/puppet-meetbot/+/829801 | 02:59 |
opendevreview | Merged opendev/puppet-mosquitto master: Retire this repo https://review.opendev.org/c/opendev/puppet-mosquitto/+/829802 | 02:59 |
opendevreview | Merged opendev/puppet-mqtt_statsd master: Retire this repo https://review.opendev.org/c/opendev/puppet-mqtt_statsd/+/829803 | 03:00 |
opendevreview | Merged opendev/puppet-nodepool master: Retire this repo https://review.opendev.org/c/opendev/puppet-nodepool/+/829806 | 03:01 |
opendevreview | Merged opendev/puppet-openafs master: Retire this repo https://review.opendev.org/c/opendev/puppet-openafs/+/829807 | 03:02 |
opendevreview | Merged opendev/puppet-openstackci master: Retire this repo https://review.opendev.org/c/opendev/puppet-openstackci/+/829808 | 03:02 |
opendevreview | Merged opendev/puppet-pgsql_backup master: Retire this repo https://review.opendev.org/c/opendev/puppet-pgsql_backup/+/829809 | 03:03 |
opendevreview | Merged opendev/puppet-planet master: Retire this repo https://review.opendev.org/c/opendev/puppet-planet/+/829810 | 03:03 |
opendevreview | Merged opendev/puppet-ptgbot master: Retire this repo https://review.opendev.org/c/opendev/puppet-ptgbot/+/829811 | 03:03 |
opendevreview | Merged opendev/puppet-puppet master: Retire this repo https://review.opendev.org/c/opendev/puppet-puppet/+/829812 | 03:04 |
opendevreview | Merged opendev/puppet-refstack master: Retire this repo https://review.opendev.org/c/opendev/puppet-refstack/+/829814 | 03:04 |
opendevreview | Merged opendev/puppet-ssl_cert_check master: Retire this repo https://review.opendev.org/c/opendev/puppet-ssl_cert_check/+/829815 | 03:05 |
opendevreview | Merged opendev/puppet-statusbot master: Retire this repo https://review.opendev.org/c/opendev/puppet-statusbot/+/829816 | 03:05 |
opendevreview | Merged opendev/puppet-sudoers master: Retire this repo https://review.opendev.org/c/opendev/puppet-sudoers/+/829817 | 03:06 |
opendevreview | Merged opendev/puppet-translation_checksite master: Retire this repo https://review.opendev.org/c/opendev/puppet-translation_checksite/+/829818 | 03:06 |
opendevreview | Merged opendev/puppet-unattended_upgrades master: Retire this repo https://review.opendev.org/c/opendev/puppet-unattended_upgrades/+/829819 | 03:06 |
opendevreview | Merged opendev/puppet-unbound master: Retire this repo https://review.opendev.org/c/opendev/puppet-unbound/+/829820 | 03:07 |
opendevreview | Merged opendev/puppet-zuul master: Retire this repo https://review.opendev.org/c/opendev/puppet-zuul/+/829821 | 03:07 |
opendevreview | Merged opendev/askbot-theme master: Retire this repo https://review.opendev.org/c/opendev/askbot-theme/+/829822 | 03:07 |
opendevreview | Merged opendev/germqtt master: Retire this repo https://review.opendev.org/c/opendev/germqtt/+/829823 | 03:10 |
opendevreview | Merged opendev/lpmqtt master: Retire this repo https://review.opendev.org/c/opendev/lpmqtt/+/829824 | 03:10 |
opendevreview | Merged opendev/mqtt_statsd master: Retire this repo https://review.opendev.org/c/opendev/mqtt_statsd/+/829825 | 03:11 |
*** ysandeep|out is now known as ysandeep | 04:36 | |
*** ysandeep is now known as ysandeep|lunch | 07:21 | |
*** ysandeep|lunch is now known as ysandeep | 07:49 | |
*** ricolin is now known as Guest1112 | 07:59 | |
*** ricolin_ is now known as ricolin | 07:59 | |
yoctozepto | clarkb: hi! yeah, I was out; I confirm it's working now | 08:10 |
*** jpena|off is now known as jpena | 08:37 | |
*** ysandeep is now known as ysandeep|afk | 10:25 | |
*** rlandy|out is now known as rlandy|ruck | 11:12 | |
*** dviroel|out is now known as dviroel | 11:15 | |
*** ysandeep|afk is now known as ysandeep | 11:17 | |
*** pojadhav- is now known as pojadhav | 12:44 | |
*** ricolin_ is now known as ricolin | 14:13 | |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 15:01 | |
*** dviroel is now known as dviroel|lunch | 15:22 | |
johnsom | Any idea why this patch didn't launch in Zuul? https://review.opendev.org/c/openstack/python-designateclient/+/831687 | 15:45 |
frickler | johnsom: I was just asking the same in the release channel | 15:47 |
frickler | I'm not sure whether we missing branching devstack before branching clients | 15:47 |
frickler | gmann: ^^ ? | 15:47 |
fungi | is it possible the jobs are too aggressively limiting what files trigger them, and a change which only alters .gitreview matches no jobs? | 15:48 |
fungi | i would expect the new branch of designate to fall back to finding job definitions in devstack's master branch if it lacks a stable/yoga | 15:49 |
frickler | fungi: for the .gitreview change that might explain it, but there's also the tox.ini one. and seems to affect a couple of repos, though not everyone | 15:51 |
fungi | ahh, okay | 15:51 |
fungi | setting debug in a dnm change to that branch might give more info | 15:51 |
johnsom | Hmmm, someone just rechecked and it launched | 15:52 |
johnsom | This one is in the same state: https://review.opendev.org/c/openstack/python-designateclient/+/831690 | 15:53 |
johnsom | I can recheck these, but if you want me to hold off for debugging let me know | 15:53 |
johnsom | It looks like all of the Octavia bot-produced patches didn't launch either. Six there. | 15:55 |
gmann | frickler: johnsom we need to wait for all projects finishing the branching and then we do devstack and grenade branch | 15:57 |
johnsom | They rechecked the other one now too. Again, it launched this time | 15:58 |
gmann | all project means devstack supported projetcs which release team give us signal | 15:58 |
gmann | if we do before devstack deps project then it will fail in clone | 15:59 |
frickler | hmm, if recheck works, it might be some race condition in zuul or gerrit between creating the branch and checking patches on that branch | 15:59 |
fungi | zuul01 saw the initial patchset upload... 2022-03-03 10:54:05,369 DEBUG zuul.Pipeline.openstack.check: [e: 88388695ddd04990a7108eaf0a1a84d2] Event <GerritTriggerEvent patchset-created opendev.org/openstack/python-designateclient stable/yoga 831687,1> for change <Change 0x7f689eb0e940 openstack/python-designateclient 831687,1> matched <GerritEventFilter connection: gerrit types: patchset-created | 15:59 |
fungi | ignore_deletes: True> in pipeline <IndependentPipelineManager check> | 15:59 |
johnsom | Yeah, I don't think these patches even need devstack for testing. They only do unit tests, etc. | 16:00 |
fungi | 2022-03-03 10:54:25,884 INFO zuul.Pipeline.openstack.check: [e: 88388695ddd04990a7108eaf0a1a84d2] Adding change <Change 0x7f689eb0e940 openstack/python-designateclient 831687,1> to queue <ChangeQueue check: > in <Pipeline check> | 16:00 |
fungi | 2022-03-03 10:56:16,594 DEBUG zuul.Pipeline.openstack.check: [e: 88388695ddd04990a7108eaf0a1a84d2] No jobs for change <Change 0x7f689eb0e940 openstack/python-designateclient 831687,1> | 16:04 |
fungi | so, yeah, it thought there were no jobs which matched the change at that point in time | 16:04 |
*** ysandeep is now known as ysandeep|out | 16:04 | |
fungi | here's the debug log entries for that event as pertains to the check pipeline: https://paste.opendev.org/show/813027 | 16:07 |
fungi | i wonder if cached layouts are unreliable immediately following branch creation | 16:09 |
frickler | fungi: seems tenant reconfiguration took a long time. designateclient was in the list at 11:02:47, after the above patch was submitted | 16:16 |
frickler | https://paste.opendev.org/show/b6vT47il2LMEZfs9hq3A/ | 16:17 |
elodilles | does that mean stable/yoga was not yet created when the check job trigger arrived? | 16:18 |
frickler | elodilles: the branch was likely created in gerrit, but zuul hadn't updated its configuration with it | 16:18 |
elodilles | frickler: ack | 16:18 |
frickler | are the .gitreview patches created automatically by the branch patch or is that a different step? | 16:19 |
fungi | yeah, gerrit wouldn't allow pushing a change for a nonexistent branch | 16:19 |
frickler | question is how easy it would be to delay them for maybe some hours | 16:19 |
frickler | the other option might be to have zuul hold processing a patch when it hasn't reconfigured yet, no idea how feasible that might be | 16:20 |
clarkb | I thought that was what it did already | 16:21 |
fungi | this race does seem like a regression in zuul though, so presumably something we'll want to solve there | 16:21 |
clarkb | ya maybe it changed from that behavior which I had thought already happened. maybe when we switched the way pipelines are processed to only processing them with events | 16:21 |
fungi | the openstack release workflow hasn't changed recently | 16:21 |
clarkb | ya zuul had some pipeline processing updates to make it quicker and more efficient | 16:22 |
clarkb | its possible that side effected how it handles reconfigurations allowing things to skip ahead maybe? | 16:22 |
*** dviroel|lunch is now known as dviroel | 16:27 | |
clarkb | jentoio: fungi: the latest failure is my fault. Sorry about that. I've suggested in review that we just go back to root:root ownership in the handler file. That will work with the preexisiting permissions. | 16:27 |
jentoio | clarkb: thanks for following up. I think this has been a good learning task and we will find these issues as we go. I'll add the feedback after some coffee. | 16:33 |
fungi | it's also a great demonstration of the testing, and how/where to look to identify errors | 16:35 |
jentoio | agreed | 16:38 |
fungi | clarkb: all the topic:retirement changes have merged, and i bulk-abandined any open changes for those repos as i went | 16:40 |
fungi | so we're probably ready for the acl phase? | 16:40 |
fungi | er, bulk-abandoned | 16:41 |
clarkb | fungi: ya acl and zuul cleanup is next | 16:41 |
clarkb | I can work on that soon. I've also got a todo to update our gitea 1.16 change to 1.16.3 which released overnight | 16:42 |
fungi | looking back at repos which still have open changes, i wonder if we should also retire any of puppet-accessbot, puppet-jeepyb, puppet-logrotate, puppet-mysql_backup, puppet-openstackid, puppet-packagekit, puppet-redis, puppet-snmpd, puppet-stackalytics | 16:44 |
corvus | i'd like to rolling-restart all of zuul; any objections? | 16:44 |
corvus | (step 1 would be 6+ hours of executor restarts, followed by scheduler and web) | 16:44 |
fungi | corvus: no objections from me. also see above discussion of possible race regression with the scheduler finding no jobs for a change pushed immediately following creation of the branch it targets | 16:45 |
corvus | (step 2 would involve us going down to 0 schedulers briefly due to a sql database migration. we would miss some events, but in-flight jobs would be unaffected) | 16:45 |
fungi | will this be our first rolling restart with a db migration? | 16:46 |
fungi | i can't recall | 16:46 |
corvus | fungi: i think that behavior is likely quite old... i think that can happen because we generate reconfiguration events in the trigger processing phase, and they are processed on the next pass through the main loop since they are management events. while that could happen even with one scheduler, it may be compounded by having multiple schedulers since the others can continue to process pipelines up until the actual reconfiguration starts. | 16:48 |
corvus | i think the behavior would date at least to the start of the sos work, if not before. | 16:48 |
corvus | fungi: i think this is the 2nd or 3rd, but this one wants to be done without any other schedulers running so they don't keep inserting wrong data | 16:49 |
fungi | ahh, thanks, yes we probably haven't noticed that race until now since this is the first time openstack has been doing bulk branch creation after we started running with a second scheduler, so the chances we'd see it may just be higher now | 16:52 |
*** dviroel__ is now known as dviroel | 16:56 | |
opendevreview | Jack Morgan proposed opendev/system-config master: Adds support for running zuul-registry as a non-root user https://review.opendev.org/c/opendev/system-config/+/831462 | 17:07 |
*** marios is now known as marios|out | 17:09 | |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 17:11 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Update Gitea to 1.16.3 https://review.opendev.org/c/opendev/system-config/+/828184 | 17:19 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM change to test and hold gitea 1.16 https://review.opendev.org/c/opendev/system-config/+/828586 | 17:19 |
clarkb | infra-root ^ I cleaned up the old autohold and put a new one in place for that new ps | 17:21 |
fungi | oh! it's released, yay | 17:21 |
clarkb | yup | 17:22 |
clarkb | I've approved the zuul-registry bug fix change and its followup test improvement | 17:30 |
clarkb | fungi: ^ once that lands and the docker image is promoted we should be able to recheck your gerrit apache change | 17:30 |
clarkb | fungi: I'm starting up on the next retirement change. Any sense for whether or not it would help to break this up into say groups of 10 for reviewers? Or just do one large change? | 17:35 |
opendevreview | Merged zuul/zuul-jobs master: [ensure-python] Improve check for CentOS/RHEL 9 packages https://review.opendev.org/c/zuul/zuul-jobs/+/831423 | 17:37 |
clarkb | I think I'll do an all in one to rip the bandaid off | 17:40 |
fungi | clarkb: one large change is preferable for me, as long as we can prioritize it so that it's not in perpetual merge-conflict | 17:47 |
clarkb | ++ | 17:50 |
fungi | clarkb: any thoughts on whether we should also retire any/all of the 9 repos i mentioned at 16:44 utc? i've lost track of your original pad which had the list of repos to retire | 17:53 |
fungi | not sure if they got missed or were intentionally skipped for now | 17:53 |
fungi | there may also be more, i merely noticed those because they have open changes showing in gertty | 17:54 |
opendevreview | Clark Boylan proposed openstack/project-config master: Finalize batch of opendev repo retirements https://review.opendev.org/c/openstack/project-config/+/831837 | 17:58 |
clarkb | fungi: https://etherpad.opendev.org/p/opendev-repo-retirements is the etherpad | 17:58 |
fungi | aha, thanks! | 17:59 |
clarkb | fungi: puppet-accessbot, puppet-stackalytics and puppet-openstackid are done already. logrotate, jeepyb, mysql redis etc are still in use | 17:59 |
fungi | okay, so some of those probably just need changes abandoned | 18:00 |
clarkb | fungi: redis is used by ethercalc. I think logrotate is used by anything with logs we rotate that is puppet deployed | 18:00 |
fungi | got it, so we didn't replace puppet-logrotate with a general ansible role | 18:00 |
clarkb | fungi: oh yup if ther eare open changes for puppet-accesbot, puppet-stackalytics, or puppet-openstackid I'm sure those can be abandoned | 18:00 |
fungi | and yeah, i keep forgetting ethercalc is still puppeted | 18:00 |
*** jpena is now known as jpena|off | 18:01 | |
fungi | i will abandon open changes on those three if the acl still permits | 18:01 |
clarkb | oh right we may need to become a super user or something | 18:01 |
fungi | it seems to have allowed me to abandon those | 18:03 |
clarkb | thank you for taking care of the change abandonments | 18:04 |
fungi | puppet-snmpd isn't still needed is it? | 18:05 |
clarkb | that one might not be needed as ansible is likely configuring that | 18:05 |
clarkb | fungi: I've rechecked https://review.opendev.org/c/opendev/system-config/+/829975 whcih will exercise the new registry update | 18:05 |
clarkb | fungi: the easiest way to check is push up a change that removes it from system-config's modules.env | 18:06 |
fungi | oh, yep. i'll do that | 18:06 |
clarkb | if that doesn't break due to the module being gone then it shouldn't be used | 18:06 |
clarkb | ugh the gerrit 3.5 builds are broken now I think because the submodule didn't get updated when they updated gerrit to handle the hash thing | 18:29 |
fungi | :/ | 18:32 |
fungi | i guess they normally bump submodule refs on a schedule or something? | 18:33 |
fungi | clarkb: https://review.opendev.org/729321 removed puppet-snmpd from our modules.env a year ago | 18:36 |
fungi | so i guess it's unused | 18:36 |
clarkb | fungi: in this case the submodule had to be updated in concert with the chagne we made for ${hash} to actualyl get the hash beacuse davido didn't want to keep the backward compatibile method | 18:39 |
clarkb | but now looking at gerrit 3.5 I don't see where their plugin/gitiles commits are coming from on on the gitiles side | 18:39 |
clarkb | they don't seem to line up at all | 18:39 |
clarkb | oh I think I see it | 18:39 |
clarkb | They moved plugin gitiles to master from stable-3.5 | 18:39 |
clarkb | stable-3.5 would work | 18:40 |
clarkb | ugh | 18:40 |
clarkb | hrm or did I somehow end up with master checked out as stable-3.5 this whole thing is very consuing | 18:41 |
clarkb | ok for some reason my local stable-3.4 was checked out against master for gerrit which is why I saw it using master plugins/gitiles I don't know how that got confused, but I guess I didn't have it set up to track origin/stabl-3.5 ugh | 18:46 |
clarkb | I think I see the problem though | 18:46 |
corvus | beginning the merger/executor rolling restarts now | 18:47 |
opendevreview | Clark Boylan proposed opendev/system-config master: Pull gerrit/plugins/gitiles from stable branch not tag https://review.opendev.org/c/opendev/system-config/+/831839 | 18:48 |
clarkb | fungi: ^ I think we can rebase onto that. What happened was plugins/gitiles didn't have a stable-3.5 branch until they merged in our bugfix and at that point they branched it | 18:49 |
clarkb | good news is the 3.4 build compelted and the registry didn't break it so the registry seems to be working at least with one build against it | 18:49 |
clarkb | and I think the lack of the previous branch may have confused my checkouts locally? | 18:50 |
fungi | ahh, cool | 18:51 |
clarkb | fungi: do you want to rebase or should I? | 18:54 |
clarkb | or do we want to wait for it to show it works first? | 18:54 |
fungi | i was going to just wait for it to merge | 19:00 |
fungi | but i can rebase it if you like | 19:00 |
fungi | the rebase isn't really going to tell us whether the registry fix worked though, until it's actually in the gate | 19:01 |
clarkb | fungi: it will check the registry in check as well | 19:02 |
fungi | it was succeeding in check before the registry fix though, only failing in the gate pipeline | 19:02 |
clarkb | it was just bad luck previously that we hit it more often in the gate (I think beacuse gate node provisioning has priority so more likely to get nodes assigned together) | 19:02 |
clarkb | ya I think that was pure luck | 19:02 |
fungi | ahh, okay | 19:03 |
clarkb | since it is a race | 19:03 |
clarkb | and now an ssl connection error to maven trying to download a jar | 19:15 |
clarkb | eventually we'll start making forward progress | 19:15 |
clarkb | ianw: fungi: I'm not sure the set -o pipefail fix is working for test playbooks https://zuul.opendev.org/t/openstack/build/41d620b615f9460ca45065acb947aea5/log/job-output.txt that should've failed but it succeeded | 19:26 |
fungi | huh... | 19:27 |
clarkb | looking more closely at it https://zuul.opendev.org/t/openstack/build/41d620b615f9460ca45065acb947aea5/console#3/1/30/bridge.openstack.org seems to indicate we didn't set pipefail on that command | 19:27 |
clarkb | but we updated it to do so | 19:27 |
clarkb | did zuul not merge into master before running that? | 19:27 |
clarkb | Oh or maybe the nested ansible is using a literal checkout without the merge? | 19:27 |
clarkb | there is a bug here somewhere and I suspect manually rebasing would address it | 19:28 |
clarkb | the playbook that is stale is defined as the run playbook for the system-config-run job. I think this may be a zuul bug | 19:30 |
clarkb | corvus: ^ fyi we don't seem to be using merged git state when constructing the job playbooks | 19:31 |
clarkb | https://zuul.opendev.org/t/openstack/build/35731d71aafb442c98609d28a94ac4d2/console#3/1/30/bridge.openstack.org which is fungi's gerrit apache update did use an updated job playbook | 19:33 |
clarkb | which implies this sin't a consistent issue as that chagne predates the set -o pipefail too iirc | 19:33 |
clarkb | corvus: maybe a caching bug where we've cached stale info for a change and are not re merging it even though master has moved ahead? | 19:33 |
clarkb | I'm going to try a recheck and see if this is consistent | 19:35 |
fungi | you're sure we didn't just miss adding pipefail to one of the invocations? | 19:37 |
corvus | clarkb: it looks like https://zuul.opendev.org/t/openstack/build/41d620b615f9460ca45065acb947aea5/console#3/1/30/bridge.openstack.org ran more than a week before https://review.opendev.org/831465 was written and merged | 19:38 |
corvus | so i think that means we would not expect pipefail to be present in that invocation | 19:38 |
clarkb | oh hrm. Did gerrit somehow serve me a stale version of the chang with a zuul +1 verified? | 19:39 |
clarkb | I'ev hit refresh and it is gone now which should've been the case when I pushed the new patchset | 19:39 |
clarkb | ugh sorry, I used the zuul summary and didn't notice it was the stale result and gerrit was still showing +1. I did a hard refresh and it is gone | 19:40 |
corvus | np. zuul tests the future, just not like that. | 19:40 |
clarkb | https://zuul.opendev.org/t/openstack/build/fd01e2a702c64e228db0e5b060e211bc is the one for the most recent ps which zuul status reports as failed, but opening the link says it does not exist | 19:40 |
clarkb | maybe it hasn't reported yet so the db can't find it | 19:41 |
fungi | yeah, i think that future may have needed sarah connor | 19:41 |
clarkb | I have too many pots on the fire right now | 19:41 |
ianw | ... so the conclusion is that it's probably working? | 19:42 |
clarkb | ianw: yes sorry | 19:42 |
clarkb | though why the build for the latest ps isn't yet available is another mystery | 19:42 |
clarkb | 828586,3 seems maybe stuck in limbo waiting on the paused job to complete | 19:44 |
clarkb | I guess that could be related to trying to process some events though | 19:44 |
corvus | builds should exist in the db as soon as they start now | 19:45 |
clarkb | corvus: hrm that one shows as not existing and is compelted | 19:45 |
clarkb | the paused build is finishing up now at least so ya likely waiting on an event to be processed | 19:46 |
corvus | clarkb: your answer is in gerrit now | 19:47 |
clarkb | I've approved https://review.opendev.org/c/opendev/system-config/+/831839 as it passed testing second time around (yay no more ssl problems) | 19:47 |
clarkb | corvus: aha the parent build failed so it couldn't run at all hence no record? | 19:48 |
corvus | (in a comment on https://review.opendev.org/828586 ) | 19:48 |
corvus | yep. we could probably try to get more info into the status json for that case. | 19:48 |
clarkb | that would be great. Sorry for all the noise here I really do have too many things on the fire and should start trimming | 19:49 |
corvus | (but at least it's consistent with the behavior i described -- we get a build record in the db when the build starts -- and this one didn't start :) | 19:49 |
clarkb | ++ | 19:49 |
corvus | we just forgot failing without starting is an option | 19:50 |
clarkb | corvus: maybe even changing failed to not startable would be good | 19:50 |
clarkb | as the status I mean | 19:50 |
corvus | yes. not trivial. | 19:51 |
ianw | fungi: speaking of logs; https://review.opendev.org/q/topic:system-config-encrypt-logs are two that could use your eye if you have time. one updates system-config docs to explain adding your keys, the other is an attempt to cover the expiring keys issue you brought up | 19:55 |
clarkb | I'm going to find lunch but 829975 can be rechecked as soon as 831839 lands | 19:57 |
opendevreview | Merged opendev/system-config master: Pull gerrit/plugins/gitiles from stable branch not tag https://review.opendev.org/c/opendev/system-config/+/831839 | 20:18 |
clarkb | two successful image builds on 829975. That is a really good sign | 20:47 |
clarkb | jentoio: fungi: I +2'd the change but didn't +A it beacuse I'm not in a good spot today to watch it :/ too many things as noted before. I'm hopefully I can +A tomorrow morning though. Or someone else can approve it if they have time | 20:50 |
ianw | clarkb: hrm, https://zuul.opendev.org/t/zuul/build/9d8641e33e9f4eeab69bb705fb09a664 failed in the "printf "1\n2\n3\n4\n" | xargs -P 4 -I DNE podman push localhost:9000/test/image" | 20:55 |
ianw | Error: error copying image to the remote destination: Error writing blob: Patch "https://localhost:9000/v2/test/image/blobs/uploads/80cd43a81d594870970be61089320839": net/http: TLS handshake timeout | 20:55 |
ianw | https://review.opendev.org/c/zuul/zuul-registry/+/831339 then kicked itself out, didn't appear to have started the job | 20:59 |
clarkb | ianw: tls handshake timeout is interesting | 21:13 |
ianw | i can't see anything in the zuul-registry logs | 21:14 |
clarkb | ianw: I think that is well above any of the changes I was making and trying to test. Coudl it be that we just don't have enough entropy in that environment to do that many connections? | 21:14 |
ianw | that seems to suggest that the registry-side was not responding | 21:14 |
clarkb | ianw: https://zuul.opendev.org/t/zuul/build/9d8641e33e9f4eeab69bb705fb09a664/log/docker/functionaltest_registry_1.txt that has logs though | 21:15 |
clarkb | or you do you mean you can't see anything related to the failure in the logs? | 21:15 |
clarkb | ya my suspicion is that something a layer in front of us failed and maybe due to lack of entropy? | 21:15 |
ianw | yeah, nothing relating to the failure, it doesn't mention 80cd43... | 21:15 |
ianw | https://review.opendev.org/c/zuul/zuul-registry/+/831846 will turn that up to debug | 21:16 |
clarkb | +2 I bet we need cherrypy logs to see what happened there | 21:16 |
ianw | clarkb: also https://review.opendev.org/c/zuul/zuul-jobs/+/831326 is required for that focal stack, if you could poke at that one | 21:17 |
ianw | just installs the containernetworking package to stop it warning about missing plugins | 21:17 |
clarkb | looking | 21:17 |
ianw | i should actually unstack that DEBUG one | 21:18 |
clarkb | how did the centos-8 jobs stay in there? I thought I cleaned them all up | 21:18 |
ianw | yeah, i guess we just missed a grep | 21:19 |
clarkb | 829975 has entered the gate | 21:25 |
clarkb | looks like gitea 1.16.3 doesn't fully fix problems with complex diffs for delted/renamed files https://158.69.67.50:3081/opendev/system-config/commit/1d5f5a7657bd6c6c4af7506d1f3dd3aa9a5187bc but https://158.69.67.50:3081/opendev/system-config/commit/25cdc979507f1b3ec68781a541c0b196bd451f2f does look a bit better than before | 21:34 |
clarkb | https://158.69.67.50:3081/opendev/system-config/commit/8f8100ed28d15bdad935b82dbfd6bb2d35203614 looks better too | 21:35 |
clarkb | Looks like 1.15.11 struggles with the first example too so that isn't a regression https://opendev.org/opendev/system-config/commit/1d5f5a7657bd6c6c4af7506d1f3dd3aa9a5187bc | 21:36 |
*** dviroel is now known as dviroel|out | 21:37 | |
ianw | clarkb: https://zuul.opendev.org/t/zuul/build/9906bc439b334fbaa5cdf20d05b11b3a/logs -- i guess in the success case we don't collect any logs, because the testing registry has already exited. i'm not sure if this is a bug or feature | 22:11 |
clarkb | ianw: oh interesting. I would consider that a bug | 22:13 |
ianw | it would probably be better to save individual logs for each test | 22:16 |
*** dviroel|out is now known as dviroel | 22:21 | |
opendevreview | Merged opendev/system-config master: Block access to Gitiles https://review.opendev.org/c/opendev/system-config/+/829975 | 22:22 |
clarkb | that was a long time coming | 22:22 |
clarkb | fungi: ^ finally | 22:23 |
fungi | yay! | 22:38 |
fungi | but so much more satisfying | 22:38 |
clarkb | corvus: infra-prod-service-review is waitingon semaphore infra-prod-playbook, I don't see any other deploy or opendev-prod-hourly deployment jobs that might be holding it. | 22:54 |
clarkb | When that job was enqueued there were opendev-prod-hourly jobs running with that semaphore held. I wonder if we didn't unlock? | 22:55 |
clarkb | I think we have a cleanup routine to find those and remove them though I don't recall how often they run | 22:55 |
clarkb | but calling it out in case we get unstuck by the cleanup routines and this gets missed as it is possibly a bug | 22:56 |
corvus | the semaphore cleanup routine runs very frequently (5m i think) | 23:00 |
corvus | to avoid such issues | 23:00 |
clarkb | ya it just started | 23:00 |
clarkb | ya not sure how important it is to try and track down those instances to see if they are fixable | 23:00 |
corvus | clarkb: do you think it was an error or just a delay? | 23:00 |
clarkb | corvus: other pipelines were being processed at the time (check was adding jobs) but I guess we process them separately and possibly on different schedulers. But the queue values were all 0 for some time too | 23:01 |
clarkb | I suppose it is possible the delay was due to not getting a scheduler to process the pipeline as they were busy with other pipelines | 23:01 |
corvus | we "only" have 2 :) | 23:02 |
corvus | 2022-03-03 22:36:05,760 INFO zuul.zk.SemaphoreHandler: [e: a043c82fc92c4d3d9eca877e959371d6] Semaphore /zuul/semaphores/openstack/infra-prod-playbook released for {'buildset_path': '/zuul/tenant/openstack/pipeline/opendev-prod-hourly/item/3079ee12761f47bc83576a0d2260829b/buildset/a723807d99cb4ee78eb4427cfdfcd7d9', 'job_name': 'infra-prod-service-eavesdrop'} | 23:03 |
corvus | 2022-03-03 22:59:44,546 INFO zuul.zk.SemaphoreHandler: [e: c7a93cc257a34d6282c555f51f02fbca] Semaphore infra-prod-playbook acquired: job infra-prod-service-review, item <QueueItem 23df53f0d1964ae7a7c5dd0cba6660e4 for <Change 0x7f234c182970 opendev/system-config 829975,3> in deploy> | 23:03 |
corvus | is that the sequence you're looking at? | 23:03 |
clarkb | corvus: ya | 23:04 |
corvus | hypothesis: the semaphore release in pipeline prod-hourly did not trigger a pipeline run of deploy | 23:04 |
corvus | (and deploy just sat there waiting for a triggering event) | 23:04 |
clarkb | oh! since that is a bit more optimized now | 23:04 |
corvus | ya | 23:04 |
corvus | should be able to confirm by looking for zero deploy pipeline runs in that timeframe | 23:05 |
clarkb | I guess that would be another condition to check for setting the refreshed flag? | 23:05 |
clarkb | "any pending locks" though I'm not sure how easy that is to do | 23:05 |
clarkb | fungi: I think the url handling for gitiles is in place now if you want to double check it | 23:07 |
corvus | the logs confirm the hypothesis | 23:08 |
corvus | i think fixing this will be tricky | 23:11 |
corvus | -> #zuul | 23:11 |
fungi | wget https://review.opendev.org/plugins/gitiles/opendev/system-config/ | 23:14 |
fungi | ERROR 403: Forbidden. | 23:14 |
fungi | lgtm | 23:14 |
clarkb | successful failure | 23:16 |
fungi | that's roughly what it said beneath my yearbook photo too | 23:18 |
*** rlandy|ruck is now known as rlandy|out | 23:36 | |
NeilHanlon | mnasiadka: it looks like the latest build worked on nb01.opendev.org, so I think we're nearly there for rocky 8 nodes | 23:46 |
NeilHanlon | was away this week so I've not been paying as much attention unfortunately | 23:46 |
clarkb | NeilHanlon: ya I think its largely at a "use it and see what breaks" point now | 23:48 |
NeilHanlon | ack, thanks clarkb :) also turns out you were right about the node definition in jobs.yaml needing the name: key despite the doc's insistence otherwise heh | 23:53 |
clarkb | infra-root I've noticed that some airship jobs are hitting node failures due to the removal we did. Not much we can do about that, but thought I'd mention it | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!