Thursday, 2022-03-03

*** dviroel|out is now known as dviroel01:30
*** rlandy|ruck|bbl is now known as rlandy|ruck01:50
*** rlandy|ruck is now known as rlandy|out01:59
*** dviroel is now known as dviroel|out02:02
opendevreviewMerged opendev/puppet-apparmor master: Retire this repo  https://review.opendev.org/c/opendev/puppet-apparmor/+/82975902:44
opendevreviewMerged opendev/puppet-askbot master: Retire this repo  https://review.opendev.org/c/opendev/puppet-askbot/+/82976002:44
opendevreviewMerged opendev/puppet-asterisk master: Retire this repo  https://review.opendev.org/c/opendev/puppet-asterisk/+/82976102:45
opendevreviewMerged opendev/puppet-bandersnatch master: Retire this repo  https://review.opendev.org/c/opendev/puppet-bandersnatch/+/82976202:46
opendevreviewMerged opendev/puppet-bugdaystats master: Retire this repo  https://review.opendev.org/c/opendev/puppet-bugdaystats/+/82976302:48
opendevreviewMerged opendev/puppet-ciwatch master: Retire this repo  https://review.opendev.org/c/opendev/puppet-ciwatch/+/82976402:51
opendevreviewMerged opendev/puppet-diskimage_builder master: Retire this repo  https://review.opendev.org/c/opendev/puppet-diskimage_builder/+/82976602:54
opendevreviewMerged opendev/puppet-germqtt master: Retire this repo  https://review.opendev.org/c/opendev/puppet-germqtt/+/82976702:54
opendevreviewMerged opendev/puppet-grafyaml master: Retire this repo  https://review.opendev.org/c/opendev/puppet-grafyaml/+/82976802:54
opendevreviewMerged opendev/puppet-graphite master: Retire this repo  https://review.opendev.org/c/opendev/puppet-graphite/+/82976902:55
opendevreviewMerged opendev/puppet-haveged master: Retire this repo  https://review.opendev.org/c/opendev/puppet-haveged/+/82979002:55
opendevreviewMerged opendev/puppet-hound master: Retire this repo  https://review.opendev.org/c/opendev/puppet-hound/+/82979102:56
opendevreviewMerged opendev/puppet-infra-cookiecutter master: Retire this repo  https://review.opendev.org/c/opendev/puppet-infra-cookiecutter/+/82979302:56
opendevreviewMerged opendev/puppet-jenkins master: Retire this repo  https://review.opendev.org/c/opendev/puppet-jenkins/+/82979402:56
opendevreviewMerged opendev/puppet-kerberos master: Retire this repo  https://review.opendev.org/c/opendev/puppet-kerberos/+/82979502:57
opendevreviewMerged opendev/puppet-lodgeit master: Retire this repo  https://review.opendev.org/c/opendev/puppet-lodgeit/+/82979602:57
opendevreviewMerged opendev/puppet-lpmqtt master: Retire this repo  https://review.opendev.org/c/opendev/puppet-lpmqtt/+/82979802:58
opendevreviewMerged opendev/puppet-mailman master: Retire this repo  https://review.opendev.org/c/opendev/puppet-mailman/+/82979902:58
opendevreviewMerged opendev/puppet-mediawiki master: Retire this repo  https://review.opendev.org/c/opendev/puppet-mediawiki/+/82980002:59
opendevreviewMerged opendev/puppet-meetbot master: Retire this repo  https://review.opendev.org/c/opendev/puppet-meetbot/+/82980102:59
opendevreviewMerged opendev/puppet-mosquitto master: Retire this repo  https://review.opendev.org/c/opendev/puppet-mosquitto/+/82980202:59
opendevreviewMerged opendev/puppet-mqtt_statsd master: Retire this repo  https://review.opendev.org/c/opendev/puppet-mqtt_statsd/+/82980303:00
opendevreviewMerged opendev/puppet-nodepool master: Retire this repo  https://review.opendev.org/c/opendev/puppet-nodepool/+/82980603:01
opendevreviewMerged opendev/puppet-openafs master: Retire this repo  https://review.opendev.org/c/opendev/puppet-openafs/+/82980703:02
opendevreviewMerged opendev/puppet-openstackci master: Retire this repo  https://review.opendev.org/c/opendev/puppet-openstackci/+/82980803:02
opendevreviewMerged opendev/puppet-pgsql_backup master: Retire this repo  https://review.opendev.org/c/opendev/puppet-pgsql_backup/+/82980903:03
opendevreviewMerged opendev/puppet-planet master: Retire this repo  https://review.opendev.org/c/opendev/puppet-planet/+/82981003:03
opendevreviewMerged opendev/puppet-ptgbot master: Retire this repo  https://review.opendev.org/c/opendev/puppet-ptgbot/+/82981103:03
opendevreviewMerged opendev/puppet-puppet master: Retire this repo  https://review.opendev.org/c/opendev/puppet-puppet/+/82981203:04
opendevreviewMerged opendev/puppet-refstack master: Retire this repo  https://review.opendev.org/c/opendev/puppet-refstack/+/82981403:04
opendevreviewMerged opendev/puppet-ssl_cert_check master: Retire this repo  https://review.opendev.org/c/opendev/puppet-ssl_cert_check/+/82981503:05
opendevreviewMerged opendev/puppet-statusbot master: Retire this repo  https://review.opendev.org/c/opendev/puppet-statusbot/+/82981603:05
opendevreviewMerged opendev/puppet-sudoers master: Retire this repo  https://review.opendev.org/c/opendev/puppet-sudoers/+/82981703:06
opendevreviewMerged opendev/puppet-translation_checksite master: Retire this repo  https://review.opendev.org/c/opendev/puppet-translation_checksite/+/82981803:06
opendevreviewMerged opendev/puppet-unattended_upgrades master: Retire this repo  https://review.opendev.org/c/opendev/puppet-unattended_upgrades/+/82981903:06
opendevreviewMerged opendev/puppet-unbound master: Retire this repo  https://review.opendev.org/c/opendev/puppet-unbound/+/82982003:07
opendevreviewMerged opendev/puppet-zuul master: Retire this repo  https://review.opendev.org/c/opendev/puppet-zuul/+/82982103:07
opendevreviewMerged opendev/askbot-theme master: Retire this repo  https://review.opendev.org/c/opendev/askbot-theme/+/82982203:07
opendevreviewMerged opendev/germqtt master: Retire this repo  https://review.opendev.org/c/opendev/germqtt/+/82982303:10
opendevreviewMerged opendev/lpmqtt master: Retire this repo  https://review.opendev.org/c/opendev/lpmqtt/+/82982403:10
opendevreviewMerged opendev/mqtt_statsd master: Retire this repo  https://review.opendev.org/c/opendev/mqtt_statsd/+/82982503:11
*** ysandeep|out is now known as ysandeep04:36
*** ysandeep is now known as ysandeep|lunch07:21
*** ysandeep|lunch is now known as ysandeep07:49
*** ricolin is now known as Guest111207:59
*** ricolin_ is now known as ricolin07:59
yoctozeptoclarkb: hi! yeah, I was out; I confirm it's working now08:10
*** jpena|off is now known as jpena08:37
*** ysandeep is now known as ysandeep|afk10:25
*** rlandy|out is now known as rlandy|ruck11:12
*** dviroel|out is now known as dviroel11:15
*** ysandeep|afk is now known as ysandeep11:17
*** pojadhav- is now known as pojadhav12:44
*** ricolin_ is now known as ricolin14:13
*** rlandy|ruck is now known as rlandy|ruck|mtg15:01
*** dviroel is now known as dviroel|lunch15:22
johnsomAny idea why this patch didn't launch in Zuul? https://review.opendev.org/c/openstack/python-designateclient/+/83168715:45
fricklerjohnsom: I was just asking the same in the release channel15:47
fricklerI'm not sure whether we missing branching devstack before branching clients15:47
fricklergmann: ^^ ?15:47
fungiis it possible the jobs are too aggressively limiting what files trigger them, and a change which only alters .gitreview matches no jobs?15:48
fungii would expect the new branch of designate to fall back to finding job definitions in devstack's master branch if it lacks a stable/yoga15:49
fricklerfungi: for the .gitreview change that might explain it, but there's also the tox.ini one. and seems to affect a couple of repos, though not everyone15:51
fungiahh, okay15:51
fungisetting debug in a dnm change to that branch might give more info15:51
johnsomHmmm, someone just rechecked and it launched15:52
johnsomThis one is in the same state: https://review.opendev.org/c/openstack/python-designateclient/+/83169015:53
johnsomI can recheck these, but if you want me to hold off for debugging let me know15:53
johnsomIt looks like all of the Octavia bot-produced patches didn't launch either. Six there.15:55
gmannfrickler: johnsom we need to wait for all projects finishing  the branching and then we do devstack and grenade branch15:57
johnsomThey rechecked the other one now too. Again, it launched this time15:58
gmannall project means devstack supported projetcs which release team give us signal15:58
gmannif we do before devstack deps project then it will fail in clone15:59
fricklerhmm, if recheck works, it might be some race condition in zuul or gerrit between creating the branch and checking patches on that branch15:59
fungizuul01 saw the initial patchset upload... 2022-03-03 10:54:05,369 DEBUG zuul.Pipeline.openstack.check: [e: 88388695ddd04990a7108eaf0a1a84d2] Event <GerritTriggerEvent patchset-created opendev.org/openstack/python-designateclient stable/yoga 831687,1> for change <Change 0x7f689eb0e940 openstack/python-designateclient 831687,1> matched <GerritEventFilter connection: gerrit types: patchset-created15:59
fungiignore_deletes: True> in pipeline <IndependentPipelineManager check>15:59
johnsomYeah, I don't think these patches even need devstack for testing. They only do unit tests, etc.16:00
fungi2022-03-03 10:54:25,884 INFO zuul.Pipeline.openstack.check: [e: 88388695ddd04990a7108eaf0a1a84d2] Adding change <Change 0x7f689eb0e940 openstack/python-designateclient 831687,1> to queue <ChangeQueue check: > in <Pipeline check>16:00
fungi2022-03-03 10:56:16,594 DEBUG zuul.Pipeline.openstack.check: [e: 88388695ddd04990a7108eaf0a1a84d2] No jobs for change <Change 0x7f689eb0e940 openstack/python-designateclient 831687,1>16:04
fungiso, yeah, it thought there were no jobs which matched the change at that point in time16:04
*** ysandeep is now known as ysandeep|out16:04
fungihere's the debug log entries for that event as pertains to the check pipeline: https://paste.opendev.org/show/81302716:07
fungii wonder if cached layouts are unreliable immediately following branch creation16:09
fricklerfungi: seems tenant reconfiguration took a long time. designateclient was in the list at 11:02:47, after the above patch was submitted16:16
fricklerhttps://paste.opendev.org/show/b6vT47il2LMEZfs9hq3A/16:17
elodillesdoes that mean stable/yoga was not yet created when the check job trigger arrived?16:18
fricklerelodilles: the branch was likely created in gerrit, but zuul hadn't updated its configuration with it16:18
elodillesfrickler: ack16:18
fricklerare the .gitreview patches created automatically by the branch patch or is that a different step?16:19
fungiyeah, gerrit wouldn't allow pushing a change for a nonexistent branch16:19
fricklerquestion is how easy it would be to delay them for maybe some hours16:19
fricklerthe other option might be to have zuul hold processing a patch when it hasn't reconfigured yet, no idea how feasible that might be16:20
clarkbI thought that was what it did already16:21
fungithis race does seem like a regression in zuul though, so presumably something we'll want to solve there16:21
clarkbya maybe it changed from that behavior which I had thought already happened. maybe when we switched the way pipelines are processed to only processing them with events16:21
fungithe openstack release workflow hasn't changed recently16:21
clarkbya zuul had some pipeline processing updates to make it quicker and more efficient16:22
clarkbits possible that side effected how it handles reconfigurations allowing things to skip ahead maybe?16:22
*** dviroel|lunch is now known as dviroel16:27
clarkbjentoio: fungi: the latest failure is my fault. Sorry about that. I've suggested in review that we just go back to root:root ownership in the handler file. That will work with the preexisiting permissions.16:27
jentoioclarkb: thanks for following up. I think this has been a good learning task and we will find these issues as we go. I'll add the feedback after some coffee. 16:33
fungiit's also a great demonstration of the testing, and how/where to look to identify errors16:35
jentoioagreed16:38
fungiclarkb: all the topic:retirement changes have merged, and i bulk-abandined any open changes for those repos as i went16:40
fungiso we're probably ready for the acl phase?16:40
fungier, bulk-abandoned16:41
clarkbfungi: ya acl and zuul cleanup is next16:41
clarkbI can work on that soon. I've also got a todo to update our gitea 1.16 change to 1.16.3 which released overnight16:42
fungilooking back at repos which still have open changes, i wonder if we should also retire any of puppet-accessbot, puppet-jeepyb, puppet-logrotate, puppet-mysql_backup, puppet-openstackid, puppet-packagekit, puppet-redis, puppet-snmpd, puppet-stackalytics16:44
corvusi'd like to rolling-restart all of zuul; any objections?16:44
corvus(step 1 would be 6+ hours of executor restarts, followed by scheduler and web)16:44
fungicorvus: no objections from me. also see above discussion of possible race regression with the scheduler finding no jobs for a change pushed immediately following creation of the branch it targets16:45
corvus(step 2 would involve us going down to 0 schedulers briefly due to a sql database migration.  we would miss some events, but in-flight jobs would be unaffected)16:45
fungiwill this be our first rolling restart with a db migration?16:46
fungii can't recall16:46
corvusfungi: i think that behavior is likely quite old... i think that can happen because we generate reconfiguration events in the trigger processing phase, and they are processed on the next pass through the main loop since they are management events.  while that could happen even with one scheduler, it may be compounded by having multiple schedulers since the others can continue to process pipelines up until the actual reconfiguration starts.16:48
corvusi think the behavior would date at least to the start of the sos work, if not before.16:48
corvusfungi: i think this is the 2nd or 3rd, but this one wants to be done without any other schedulers running so they don't keep inserting wrong data16:49
fungiahh, thanks, yes we probably haven't noticed that race until now since this is the first time openstack has been doing bulk branch creation after we started running with a second scheduler, so the chances we'd see it may just be higher now16:52
*** dviroel__ is now known as dviroel16:56
opendevreviewJack Morgan proposed opendev/system-config master: Adds support for running zuul-registry as a non-root user  https://review.opendev.org/c/opendev/system-config/+/83146217:07
*** marios is now known as marios|out17:09
*** rlandy|ruck|mtg is now known as rlandy|ruck17:11
opendevreviewClark Boylan proposed opendev/system-config master: Update Gitea to 1.16.3  https://review.opendev.org/c/opendev/system-config/+/82818417:19
opendevreviewClark Boylan proposed opendev/system-config master: DNM change to test and hold gitea 1.16  https://review.opendev.org/c/opendev/system-config/+/82858617:19
clarkbinfra-root ^ I cleaned up the old autohold and put a new one in place for that new ps17:21
fungioh! it's released, yay17:21
clarkbyup17:22
clarkbI've approved the zuul-registry bug fix change and its followup test improvement17:30
clarkbfungi: ^ once that lands and the docker image is promoted we should be able to recheck your gerrit apache change17:30
clarkbfungi: I'm starting up on the next retirement change. Any sense for whether or not it would help to break this up into say groups of 10 for reviewers? Or just do one large change?17:35
opendevreviewMerged zuul/zuul-jobs master: [ensure-python] Improve check for CentOS/RHEL 9 packages  https://review.opendev.org/c/zuul/zuul-jobs/+/83142317:37
clarkbI think I'll do an all in one to rip the bandaid off17:40
fungiclarkb: one large change is preferable for me, as long as we can prioritize it so that it's not in perpetual merge-conflict17:47
clarkb++17:50
fungiclarkb: any thoughts on whether we should also retire any/all of the 9 repos i mentioned at 16:44 utc? i've lost track of your original pad which had the list of repos to retire17:53
funginot sure if they got missed or were intentionally skipped for now17:53
fungithere may also be more, i merely noticed those because they have open changes showing in gertty17:54
opendevreviewClark Boylan proposed openstack/project-config master: Finalize batch of opendev repo retirements  https://review.opendev.org/c/openstack/project-config/+/83183717:58
clarkbfungi: https://etherpad.opendev.org/p/opendev-repo-retirements is the etherpad17:58
fungiaha, thanks!17:59
clarkbfungi: puppet-accessbot, puppet-stackalytics and puppet-openstackid are done already. logrotate, jeepyb, mysql redis etc are still in use17:59
fungiokay, so some of those probably just need changes abandoned18:00
clarkbfungi: redis is used by ethercalc. I think logrotate is used by anything with logs we rotate that is puppet deployed18:00
fungigot it, so we didn't replace puppet-logrotate with a general ansible role18:00
clarkbfungi: oh yup if ther eare open changes for puppet-accesbot, puppet-stackalytics, or puppet-openstackid I'm sure those can be abandoned 18:00
fungiand yeah, i keep forgetting ethercalc is still puppeted18:00
*** jpena is now known as jpena|off18:01
fungii will abandon open changes on those three if the acl still permits18:01
clarkboh right we may need to become a super user or something18:01
fungiit seems to have allowed me to abandon those18:03
clarkbthank you for taking care of the change abandonments18:04
fungipuppet-snmpd isn't still needed is it?18:05
clarkbthat one might not be needed as ansible is likely configuring that18:05
clarkbfungi: I've rechecked https://review.opendev.org/c/opendev/system-config/+/829975 whcih will exercise the new registry update18:05
clarkbfungi: the easiest way to check is push up a change that removes it from system-config's modules.env18:06
fungioh, yep. i'll do that18:06
clarkbif that doesn't break due to the module being gone then it shouldn't be used18:06
clarkbugh the gerrit 3.5 builds are broken now I think because the submodule didn't get updated when they updated gerrit to handle the hash thing18:29
fungi:/18:32
fungii guess they normally bump submodule refs on a schedule or something?18:33
fungiclarkb: https://review.opendev.org/729321 removed puppet-snmpd from our modules.env a year ago18:36
fungiso i guess it's unused18:36
clarkbfungi: in this case the submodule had to be updated in concert with the chagne we made for ${hash} to actualyl get the hash beacuse davido didn't want to keep the backward compatibile method18:39
clarkbbut now looking at gerrit 3.5 I don't see where their plugin/gitiles commits are coming from on on the gitiles side18:39
clarkbthey don't seem to line up at all18:39
clarkboh I think I see it18:39
clarkbThey moved plugin gitiles to master from stable-3.518:39
clarkbstable-3.5 would work18:40
clarkbugh18:40
clarkbhrm or did I somehow end up with master checked out as stable-3.5 this whole thing is very consuing18:41
clarkbok for some reason my local stable-3.4 was checked out against master for gerrit which is why I saw it using master plugins/gitiles I don't know how that got confused, but I guess I didn't have it set up to track origin/stabl-3.5 ugh18:46
clarkbI think I see the problem though18:46
corvusbeginning the merger/executor rolling restarts now18:47
opendevreviewClark Boylan proposed opendev/system-config master: Pull gerrit/plugins/gitiles from stable branch not tag  https://review.opendev.org/c/opendev/system-config/+/83183918:48
clarkbfungi: ^ I think we can rebase onto that. What happened was plugins/gitiles didn't have a stable-3.5 branch until they merged in our bugfix and at that point they branched it18:49
clarkbgood news is the 3.4 build compelted and the registry didn't break it so the registry seems to be working at least with one build against it18:49
clarkband I think the lack of the previous branch may have confused my checkouts locally?18:50
fungiahh, cool18:51
clarkbfungi: do you want to rebase or should I?18:54
clarkbor do we want to wait for it to show it works first?18:54
fungii was going to just wait for it to merge19:00
fungibut i can rebase it if you like19:00
fungithe rebase isn't really going to tell us whether the registry fix worked though, until it's actually in the gate19:01
clarkbfungi: it will check the registry in check as well19:02
fungiit was succeeding in check before the registry fix though, only failing in the gate pipeline19:02
clarkbit was just bad luck previously that we hit it more often in the gate (I think beacuse gate node provisioning has priority so more likely to get nodes assigned together)19:02
clarkbya I think that was pure luck19:02
fungiahh, okay19:03
clarkbsince it is a race19:03
clarkband now an ssl connection error to maven trying to download a jar19:15
clarkbeventually we'll start making forward progress19:15
clarkbianw: fungi: I'm not sure the set -o pipefail fix is working for test playbooks https://zuul.opendev.org/t/openstack/build/41d620b615f9460ca45065acb947aea5/log/job-output.txt that should've failed but it succeeded19:26
fungihuh...19:27
clarkblooking more closely at it https://zuul.opendev.org/t/openstack/build/41d620b615f9460ca45065acb947aea5/console#3/1/30/bridge.openstack.org seems to indicate we didn't set pipefail on that command19:27
clarkbbut we updated it to do so19:27
clarkbdid zuul not merge into master before running that?19:27
clarkbOh or maybe the nested ansible is using a literal checkout without the merge?19:27
clarkbthere is a bug here somewhere and I suspect manually rebasing would address it19:28
clarkbthe playbook that is stale is defined as the run playbook for the system-config-run job. I think this may be a zuul bug19:30
clarkbcorvus: ^ fyi we don't seem to be using merged git state when constructing the job playbooks19:31
clarkbhttps://zuul.opendev.org/t/openstack/build/35731d71aafb442c98609d28a94ac4d2/console#3/1/30/bridge.openstack.org which is fungi's gerrit apache update did use an updated job playbook19:33
clarkbwhich implies this sin't a consistent issue as that chagne predates the set -o pipefail too iirc19:33
clarkbcorvus: maybe a caching bug where we've cached stale info for a change and are not re merging it even though master has moved ahead?19:33
clarkbI'm going to try a recheck and see if this is consistent19:35
fungiyou're sure we didn't just miss adding pipefail to one of the invocations?19:37
corvusclarkb: it looks like https://zuul.opendev.org/t/openstack/build/41d620b615f9460ca45065acb947aea5/console#3/1/30/bridge.openstack.org ran more than a week before https://review.opendev.org/831465 was written and merged19:38
corvusso i think that means we would not expect pipefail to be present in that invocation19:38
clarkboh hrm. Did gerrit somehow serve me a stale version of the chang with a zuul +1 verified?19:39
clarkbI'ev hit refresh and it is gone now which should've been the case when I pushed the new patchset19:39
clarkbugh sorry, I used the zuul summary and didn't notice it was the stale result and gerrit was still showing +1. I did a hard refresh and it is gone19:40
corvusnp.  zuul tests the future, just not like that.19:40
clarkbhttps://zuul.opendev.org/t/openstack/build/fd01e2a702c64e228db0e5b060e211bc is the one for the most recent ps which zuul status reports as failed, but opening the link says it does not exist19:40
clarkbmaybe it hasn't reported yet so the db can't find it19:41
fungiyeah, i think that future may have needed sarah connor19:41
clarkbI have too many pots on the fire right now19:41
ianw... so the conclusion is that it's probably working?19:42
clarkbianw: yes sorry19:42
clarkbthough why the build for the latest ps isn't yet available is another mystery19:42
clarkb828586,3 seems maybe stuck in limbo waiting on the paused job to complete19:44
clarkbI guess that could be related to trying to process some events though19:44
corvusbuilds should exist in the db as soon as they start now19:45
clarkbcorvus: hrm that one shows as not existing and is compelted19:45
clarkbthe paused build is finishing up now at least so ya likely waiting on an event to be processed19:46
corvusclarkb: your answer is in gerrit now19:47
clarkbI've approved https://review.opendev.org/c/opendev/system-config/+/831839 as it passed testing second time around (yay no more ssl problems)19:47
clarkbcorvus: aha the parent build failed so it couldn't run at all hence no record?19:48
corvus(in a comment on https://review.opendev.org/828586 )19:48
corvusyep.  we could probably try to get more info into the status json for that case.19:48
clarkbthat would be great. Sorry for all  the noise here I really do have too many things on the fire and should start trimming19:49
corvus(but at least it's consistent with the behavior i described -- we get a build record in the db when the build starts -- and this one didn't start :)19:49
clarkb++19:49
corvuswe just forgot failing without starting is an option19:50
clarkbcorvus: maybe even changing failed to not startable would be good19:50
clarkbas the status I mean19:50
corvusyes.  not trivial.19:51
ianwfungi: speaking of logs; https://review.opendev.org/q/topic:system-config-encrypt-logs are two that could use your eye if you have time.  one updates system-config docs to explain adding your keys, the other is an attempt to cover the expiring keys issue you brought up19:55
clarkbI'm going to find lunch but 829975 can be rechecked as soon as 831839 lands19:57
opendevreviewMerged opendev/system-config master: Pull gerrit/plugins/gitiles from stable branch not tag  https://review.opendev.org/c/opendev/system-config/+/83183920:18
clarkbtwo successful image builds on 829975. That is a really good sign20:47
clarkbjentoio: fungi: I +2'd the change but didn't +A it beacuse I'm not in a good spot today to watch it :/ too many things as noted before. I'm hopefully I can +A tomorrow morning though. Or someone else can approve it if they have time20:50
ianwclarkb: hrm, https://zuul.opendev.org/t/zuul/build/9d8641e33e9f4eeab69bb705fb09a664 failed in the "printf "1\n2\n3\n4\n" | xargs -P 4 -I DNE podman push localhost:9000/test/image"20:55
ianwError: error copying image to the remote destination: Error writing blob: Patch "https://localhost:9000/v2/test/image/blobs/uploads/80cd43a81d594870970be61089320839": net/http: TLS handshake timeout20:55
ianwhttps://review.opendev.org/c/zuul/zuul-registry/+/831339 then kicked itself out, didn't appear to have started the job20:59
clarkbianw: tls handshake timeout is interesting21:13
ianwi can't see anything in the zuul-registry logs21:14
clarkbianw: I think that is well above any of the changes I was making and trying to test. Coudl it be that we just don't have enough entropy in that environment to do that many connections?21:14
ianwthat seems to suggest that the registry-side was not responding21:14
clarkbianw: https://zuul.opendev.org/t/zuul/build/9d8641e33e9f4eeab69bb705fb09a664/log/docker/functionaltest_registry_1.txt that has logs though21:15
clarkbor you do you mean you can't see anything related to the failure in the logs?21:15
clarkbya my suspicion is that something a layer in front of us failed and maybe due to lack of entropy?21:15
ianwyeah, nothing relating to the failure, it doesn't mention 80cd43...21:15
ianwhttps://review.opendev.org/c/zuul/zuul-registry/+/831846 will turn that up to debug21:16
clarkb+2 I bet we need cherrypy logs to see what happened there21:16
ianwclarkb: also https://review.opendev.org/c/zuul/zuul-jobs/+/831326 is required for that focal stack, if you could poke at that one21:17
ianwjust installs the containernetworking package to stop it warning about missing plugins21:17
clarkblooking21:17
ianwi should actually unstack that DEBUG one21:18
clarkbhow did the centos-8 jobs stay in there? I thought I cleaned them all up21:18
ianwyeah, i guess we just missed a grep21:19
clarkb829975 has entered the gate21:25
clarkblooks like gitea 1.16.3 doesn't fully fix problems with complex diffs for delted/renamed files https://158.69.67.50:3081/opendev/system-config/commit/1d5f5a7657bd6c6c4af7506d1f3dd3aa9a5187bc but https://158.69.67.50:3081/opendev/system-config/commit/25cdc979507f1b3ec68781a541c0b196bd451f2f does look a bit better than before21:34
clarkbhttps://158.69.67.50:3081/opendev/system-config/commit/8f8100ed28d15bdad935b82dbfd6bb2d35203614 looks better too21:35
clarkbLooks like 1.15.11 struggles with the first example too so that isn't a regression https://opendev.org/opendev/system-config/commit/1d5f5a7657bd6c6c4af7506d1f3dd3aa9a5187bc21:36
*** dviroel is now known as dviroel|out21:37
ianwclarkb: https://zuul.opendev.org/t/zuul/build/9906bc439b334fbaa5cdf20d05b11b3a/logs -- i guess in the success case we don't collect any logs, because the testing registry has already exited.  i'm not sure if this is a bug or feature22:11
clarkbianw: oh interesting. I would consider that a bug22:13
ianwit would probably be better to save individual logs for each test22:16
*** dviroel|out is now known as dviroel22:21
opendevreviewMerged opendev/system-config master: Block access to Gitiles  https://review.opendev.org/c/opendev/system-config/+/82997522:22
clarkbthat was a long time coming22:22
clarkbfungi: ^ finally22:23
fungiyay!22:38
fungibut so much more satisfying22:38
clarkbcorvus: infra-prod-service-review is waitingon semaphore infra-prod-playbook, I don't see any other deploy or opendev-prod-hourly deployment jobs that might be holding it.22:54
clarkbWhen that job was enqueued there were opendev-prod-hourly jobs running with that semaphore held. I wonder if we didn't unlock?22:55
clarkbI think we have a cleanup routine to find those and remove them though I don't recall how often they run22:55
clarkbbut calling it out in case we get unstuck by the cleanup routines and this gets missed as it is possibly a bug22:56
corvusthe semaphore cleanup routine runs very frequently (5m i think)23:00
corvusto avoid such issues23:00
clarkbya it just started23:00
clarkbya not sure how important it is to try and track down those instances to see if they are fixable23:00
corvusclarkb: do you think it was an error or just a delay?23:00
clarkbcorvus: other pipelines were being processed at the time (check was adding jobs) but I guess we process them separately and possibly on different schedulers. But the queue values were all 0 for some time too23:01
clarkbI suppose it is possible the delay was due to not getting a scheduler to process the pipeline as they were busy with other pipelines23:01
corvuswe "only" have 2 :)23:02
corvus2022-03-03 22:36:05,760 INFO zuul.zk.SemaphoreHandler: [e: a043c82fc92c4d3d9eca877e959371d6] Semaphore /zuul/semaphores/openstack/infra-prod-playbook released for {'buildset_path': '/zuul/tenant/openstack/pipeline/opendev-prod-hourly/item/3079ee12761f47bc83576a0d2260829b/buildset/a723807d99cb4ee78eb4427cfdfcd7d9', 'job_name': 'infra-prod-service-eavesdrop'}23:03
corvus2022-03-03 22:59:44,546 INFO zuul.zk.SemaphoreHandler: [e: c7a93cc257a34d6282c555f51f02fbca] Semaphore infra-prod-playbook acquired: job infra-prod-service-review, item <QueueItem 23df53f0d1964ae7a7c5dd0cba6660e4 for <Change 0x7f234c182970 opendev/system-config 829975,3> in deploy>23:03
corvusis that the sequence you're looking at?23:03
clarkbcorvus: ya23:04
corvushypothesis: the semaphore release in pipeline prod-hourly did not trigger a pipeline run of deploy23:04
corvus(and deploy just sat there waiting for a triggering event)23:04
clarkboh! since that is a bit more optimized now23:04
corvusya23:04
corvusshould be able to confirm by looking for zero deploy pipeline runs in that timeframe23:05
clarkbI guess that would be another condition to check for setting the refreshed flag?23:05
clarkb"any pending locks" though I'm not sure how easy that is to do23:05
clarkbfungi: I think the url handling for gitiles is in place now if you want to double check it23:07
corvusthe logs confirm the hypothesis23:08
corvusi think fixing this will be tricky23:11
corvus-> #zuul23:11
fungiwget https://review.opendev.org/plugins/gitiles/opendev/system-config/23:14
fungiERROR 403: Forbidden.23:14
fungilgtm23:14
clarkbsuccessful failure23:16
fungithat's roughly what it said beneath my yearbook photo too23:18
*** rlandy|ruck is now known as rlandy|out23:36
NeilHanlonmnasiadka: it looks like the latest build worked on nb01.opendev.org, so I think we're nearly there for rocky 8 nodes23:46
NeilHanlonwas away this week so I've not been paying as much attention unfortunately23:46
clarkbNeilHanlon: ya I think its largely at a "use it and see what breaks" point now23:48
NeilHanlonack, thanks clarkb :) also turns out you were right about the node definition in jobs.yaml needing the name: key despite the doc's insistence otherwise heh23:53
clarkbinfra-root I've noticed that some airship jobs are hitting node failures due to the removal we did. Not much we can do about that, but thought I'd mention it23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!