clarkb | yay | 00:00 |
---|---|---|
opendevreview | Merged opendev/zone-opendev.org master: Clean up DNS for old Rackspace Flex SJC3 mirror https://review.opendev.org/c/opendev/zone-opendev.org/+/943196 | 00:08 |
clarkb | fungi: the selenium image switch is going to pass check, any chance you have time to review that really quickly? It should be a test only change but reduces our reliance on docker hub | 00:12 |
clarkb | 943326 | 00:12 |
clarkb | I spotchecked the grafana job and there are still screenshots like we expect | 00:13 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade gitea to v1.23.5 https://review.opendev.org/c/opendev/system-config/+/943352 | 00:19 |
clarkb | and gitea just made a 1.23.5 release | 00:19 |
corvus | zuul-nox-linters SUCCESS in 1m 51s | 00:30 |
corvus | that is a very acceptable runtime :) (that's on the new small flavor) | 00:30 |
corvus | typical is 10m+ | 00:30 |
clarkb | nice. I do think the raxflex nodes are a bit more zoomier so that may explain the runtime but ya I expect linting doesn't gain much from larger nodes it just spins one cpu | 00:31 |
clarkb | using the method described here: https://alexypulivelil.medium.com/docker-hub-rate-limits-0bff3bd3ead1 it looks like I'm still getting a rate limit of 100 per 21600 seconds (6 hours) so maybe docker hub hasn't changed the limits yet... | 00:39 |
clarkb | thinking out loud: I wonder if we should consider adding that sort of lookup to the beginningof our jobs that use docker images | 00:43 |
clarkb | to start collecting real data where we are using docker | 00:43 |
fungi | does querying that count against the quota? ;) | 00:44 |
tonyb | Yeah that seems like it'd be easy to add | 00:44 |
tonyb | I'll add something today if you don't beat me to it | 00:45 |
clarkb | fungi: it showed ratelimit-remaining: 100;w=21600 so no I don't think so | 00:45 |
clarkb | should've been 99;w=21600 if it did I think | 00:46 |
clarkb | tonyb: go for it. I don't think it is a rush. I'm just trying to figure out where we stand and thats more of a longer term thing | 00:46 |
fungi | i mean, reasonably checking your request quota shouldn't count against your usage, but this is dockerhub we're talking about so i had to ask | 00:47 |
tonyb | That's a fair question | 01:46 |
tonyb | Do we have any projects that supply docker credentials via zuul secrets? They'd get their own quota and the generic check won't catch that | 01:47 |
Clark[m] | We don't. It was suggested as a possible workaround when this started to get worse not sure if anyone implemented it | 02:24 |
opendevreview | Tony Breeds proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216 | 03:09 |
opendevreview | Tony Breeds proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216 | 04:06 |
opendevreview | Tony Breeds proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216 | 05:07 |
tonyb | ^^ I think that one is "good enough" | 05:15 |
*** ralonsoh_ is now known as ralonsoh | 09:03 | |
*** ykarel_ is now known as ykarel | 10:44 | |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute https://review.opendev.org/c/zuul/zuul-jobs/+/834043 | 12:29 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute https://review.opendev.org/c/zuul/zuul-jobs/+/834043 | 12:30 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute https://review.opendev.org/c/zuul/zuul-jobs/+/834043 | 12:36 |
clarkb | fungi: frickler: last call on https://review.opendev.org/c/opendev/system-config/+/942439 I'd like to approve that at ~16:30 so that I can get it in before the 17:00 UTC hourly runs. (16:00 UTC is just a bit too early for me still with needing tea and breakfast and loading ssh keys etc) | 15:47 |
clarkb | but also if you find a concern we can hold off. That change is fairly central to our infra prod deployment jobs so scrutiny is a good thing | 15:49 |
clarkb | I guess tonyb might also be in a timezone that works for reviewing that? Not sure based on timestamps for changes from last night | 15:51 |
fungi | lgtm | 15:53 |
fungi | also plays nicely with the new zuul feature corvus pointed out yesterday | 15:55 |
clarkb | ya though in this case I think we always want to run the parent pausing job anyway so its a bit of an extra special case that doesn't optimize with the new feature | 15:57 |
*** gthiemon1e is now known as gthiemonge | 15:59 | |
clarkb | hrm I've just noticed my irc VM appears to have a clock that got too far ahead | 16:11 |
clarkb | oh no its my local machine whose clock is too far behind | 16:11 |
clarkb | yay for having redundant clocks I guess | 16:11 |
fungi | i use tmux in my sessions on every one of my systems i stay connected to, and they all have the date and time on the status bar with second resolution, so i can see at a glance if one of them is incorrect | 16:13 |
clarkb | in my case xfce's clock widget in the panel didn't match my weechat clock and I assumed it was weechat at fault as I had a couple of blip with it recently. But no my local clock was wrong. And checking it appears I had no ntp servers set | 16:18 |
clarkb | anyway I set up an ntp server and asked it to sync and now the clocks all agree | 16:18 |
clarkb | a recent update must | 16:18 |
clarkb | er must've cleared out the ntp config or something and it was long enough to skew (though I was off by 20 minutes which seems like a lot) | 16:19 |
fungi | some of my portable machines seem to lack a bios backup battery, so if the main battery runs all the way down they come back up with an ancient date | 16:25 |
clarkb | washington.edu's ntp server is called bigben | 16:26 |
clarkb | some other university has rolex. (I just went with one of the pool servers) | 16:26 |
fungi | hah | 16:27 |
clarkb | alright I'm going to approve the infra-prod runtime management change. If we decide tehre is a problem we can block ssh access to bridge from zuul as the short term mitigation and then revert the change | 16:30 |
clarkb | things to check: that system-config updates in the first boostrap job, the bootstrap job pauses, then each subsequent job runs one at a time, finally pause job is the last thing to complete | 16:30 |
clarkb | if those things hold true then I suspect we're in good shape | 16:30 |
clarkb | the change is on track to merge before the 1700 hourly jobs | 16:42 |
opendevreview | Merged opendev/system-config master: Bootstrap-bridge as top-level job https://review.opendev.org/c/opendev/system-config/+/942439 | 16:46 |
clarkb | bridge system config is currently checked out to a5487001b33616d35e14edec33b4cf84ee24ed65 but should checkout to ^ when hourlies start and stay on that checkout the entire time | 16:47 |
fungi | the selenium image source change is having really rotten luck | 16:53 |
clarkb | fungi: ya I think 943216 will help with that | 16:53 |
clarkb | so maybe we just get 943216 into a mergable spot (I'm happy with it but did have a couple of calrification questions) | 16:53 |
clarkb | merge that one then update the selenium image | 16:53 |
clarkb | actually just had a thought on that one | 16:59 |
clarkb | I'll have to update my review after monitoring the hourlies that are about to start | 16:59 |
clarkb | hourlies just started. Looks like infra-prod-service-bridge may be trying to run at the same time as bootstrap bridge | 17:02 |
clarkb | that is unexpected but I don't think it is a problem in this case? | 17:02 |
clarkb | oh and bootstrap bridge failed | 17:02 |
clarkb | ERROR! 'zuul_return' is not a valid attribute for a Play | 17:02 |
clarkb | I'll get a revert up | 17:03 |
clarkb | the other jobs are running sequentially and system-config did update because the pause request happens at the very end of the bootstrap job | 17:04 |
clarkb | so I think the currently running jobs are ok. but we would potentially have conflicts between pipelines without a revert due to a lack of a pause | 17:04 |
opendevreview | Clark Boylan proposed opendev/system-config master: Revert "Bootstrap-bridge as top-level job" https://review.opendev.org/c/opendev/system-config/+/943413 | 17:05 |
fungi | i guess we all missed that, sorry! :/ | 17:06 |
clarkb | infra-root ^ quick turnaround of that would be good then we'll have to sort out how to pause the job properly. (I think we just have to move the zuul_return ansible context into zuul level context) | 17:06 |
fungi | i've already approved it | 17:06 |
clarkb | thanks | 17:07 |
clarkb | oh this is an ansible bug not an ansible context issue. That playbook is the run playbook for zuul | 17:08 |
clarkb | I think I see how to fix it. | 17:08 |
clarkb | I think we should still revert to get back to a safe spot. Then followup with a reapplication after review again | 17:08 |
clarkb | that way we don't feel like we need to rush etc | 17:08 |
opendevreview | Clark Boylan proposed opendev/system-config master: Reapply "Bootstrap-bridge as top-level job" https://review.opendev.org/c/opendev/system-config/+/943414 | 17:15 |
clarkb | I'm going to mark ^ WIP because it only fixes the zuul_return pause issue. It doesn't address infra-prod-bootstrap-bridge and infra-prod-service-bridge starting at the same time. I suspect the reason for that is dependencies are not inherited in the way we expect | 17:16 |
clarkb | but needs further investigation. I'll put notes from what I observed in a comment | 17:16 |
clarkb | ok wip'd | 17:19 |
clarkb | the revert is in the gate now. It should be able to merge before the next hourlies. We haven't landed any other new system-config code so its probably fine if the next hourlies run on the broken code | 17:27 |
corvus | clarkb: `It almost seems like defining a dependency in a parent change isn't inherited by the child change?` | 17:34 |
corvus | clarkb: did you mean s/change/job/? | 17:34 |
clarkb | corvus: yes I did | 17:35 |
corvus | ack thinking | 17:36 |
corvus | https://imgur.com/a/k2wQvWg | 17:36 |
corvus | just to get a record of that since it's about to revert | 17:37 |
clarkb | hrm infra-prod-base should have the hard dependency on infra-prod-bootstrap-bridge | 17:37 |
clarkb | via infra-prod-playbook | 17:38 |
clarkb | looking at the graph I do wonder if each job must define its own dependencies. The relationships rendered there look like ones taht we've added to each named job and not their parents | 17:39 |
clarkb | Which I can update the change to do. It will be a lot more verbose but doable. | 17:40 |
corvus | not yet pls | 17:40 |
clarkb | ack | 17:40 |
opendevreview | Merged opendev/system-config master: Revert "Bootstrap-bridge as top-level job" https://review.opendev.org/c/opendev/system-config/+/943413 | 17:42 |
fungi | and let me just say again how amazing the job graph feature is, especially for situations like this | 17:42 |
corvus | i think this is a bug | 17:43 |
clarkb | exciting! | 17:43 |
corvus | oh wait, no; at least the thing i thought i saw i was wrong about | 17:44 |
corvus | https://etherpad.opendev.org/p/v7IBR8NCgGcHloqA5bNm | 17:52 |
corvus | i made a unit test that does that ^ and inspected the dependencies and got what i expected | 17:53 |
corvus | that's the scenario, right? | 17:53 |
clarkb | yes though I think there may be a third intermediate job in the tree | 17:53 |
corvus | that did it | 17:54 |
clarkb | job inheritance is infra-prod-service-bridge -> infra-prod-service-base -> infra-prod-base -> infra-prod-playbook | 17:54 |
corvus | wait no, that didn't do it... | 17:55 |
corvus | okay i'll add one more | 17:55 |
clarkb | actually no its only three | 17:55 |
clarkb | service-bridge -> service-base -> playbook | 17:56 |
clarkb | https://zuul.opendev.org/t/openstack/build/f85e6c92785d485c9dd50814d4a60dd0/log/zuul-info/inventory.yaml#12-18 is the full inheritance path beyond playbook but playbook is where the hard dependency lives | 17:56 |
clarkb | corvus: could it be the variant in project.yaml is overriding the dependency from the parent and not merging them? | 17:57 |
corvus | possibly; let me get the basic setup then i'll check that | 17:57 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/943414/1/zuul.d/project.yaml line 366 is where the variant defines a new dependency | 17:58 |
corvus | okay, so that's the 3 level hierarchy | 17:59 |
corvus | oh and yeah, that would definitely do it; easy fix though let me write it | 17:59 |
clarkb | corvus: feel free to update the reapply change | 18:00 |
clarkb | the 1800 hourlies are about to start I'll watch them | 18:00 |
clarkb | so far they look good to me. Only one job is running so revert appears to have applied (didn't expect it not to. I'm just double checking) | 18:02 |
corvus | clarkb: ok the gist is that dependencies override by default, so the ppc config overrides what's in the job. there's 2 ways to fix it: | 18:07 |
corvus | 1) remove the dependency from infra-prod-playbook and explicitly add it to every entry in the ppc for a job that inherits from it | 18:07 |
corvus | 2) leave it in place but set !inherit for all the relevant dependencies in the ppc | 18:08 |
corvus | i think we're probably changing the same parts of the ppc either way -- i think it would be every job that inherits from infra-prod-playbook | 18:08 |
corvus | so i think it's a matter of what's the most clear to us | 18:09 |
corvus | as cool as i think !inherit is -- i wonder if we should just spell out all the deps in the ppc explicitly here so it's more clear what's going on | 18:09 |
clarkb | ya the original setup put everything in project.yaml to try and make things explicit which has me leaning towards 1) as 2) is maybe a bit more magical | 18:09 |
clarkb | corvus: ya exactly | 18:09 |
corvus | okay, i'll finish that edit up | 18:09 |
clarkb | the reason I wanted to centralize this one dependency is I always want this job torun first but if we have to be explict about thati n project.yaml thats fine too | 18:10 |
corvus | yeah, and no way to make that "inherit" sticky | 18:10 |
fungi | i too favor the explicity of listing all the deps over introducing any complex negation. much easier to figure out what's going on that way | 18:12 |
clarkb | as an alternative we coudl move everything into the job definitions but then we lose some flexibility and it is harder to read as the jobs as a whole are even more verbose | 18:12 |
clarkb | so ya 1) seems reasonable | 18:12 |
fungi | unrelated, we're under a tornado/waterspout and flood warning for the next 7 hours, very intense front rolling through and out to sea, so if i drop off it's likely loss of power or internet | 18:14 |
clarkb | good luck! Maybe offer a beer to the sea or something | 18:15 |
fungi | i have a few cases chilling in the garage, so if the sea wants any beer it'll just take those | 18:16 |
opendevreview | James E. Blair proposed opendev/system-config master: Reapply "Bootstrap-bridge as top-level job" https://review.opendev.org/c/opendev/system-config/+/943414 | 18:16 |
corvus | maybe that? unless i went cross-eyed :) | 18:17 |
clarkb | that looks about right. Its helpful we already have the yaml substitution stuff in place for most jobs | 18:18 |
corvus | amorin: we discussed removing max_on_host at our meeting yesterday and we think that march 17 or later would be the best time for a maintenance like that. does that work for you? if needed, we can move it earlier. | 18:19 |
clarkb | corvus: there is one comment update I posted about on the change. Not sure if you want to edit that or if I should. But otherwise I think that lgtm | 18:19 |
corvus | i can fix | 18:20 |
opendevreview | James E. Blair proposed opendev/system-config master: Reapply "Bootstrap-bridge as top-level job" https://review.opendev.org/c/opendev/system-config/+/943414 | 18:20 |
fungi | lgtm | 18:21 |
clarkb | same here. Do we want to try and land it and then monitor the 1900 or 2000 UTC hourlies? | 18:22 |
clarkb | The sun came out today so I am hoping for a bike ride this afternoon but 2000 should be plenty early if we want to proceed and try again | 18:22 |
fungi | i'mm cool with that | 18:23 |
clarkb | corvus: did the zuul_return fixup look correct to you? | 18:23 |
clarkb | if so I think we can probably approve and try again | 18:24 |
fungi | i'll need to step away to cook dinner circa 21:30 utc or so | 18:24 |
fungi | but that's plenty of time | 18:24 |
clarkb | its still in check for now so no rush to approve. But ya would be good if corvus can weigh in on the other aspect of the fix if possible before we proceed | 18:29 |
clarkb | just passed check but if approved right now I think it is 50:50 if it applies for 1900 anyway. So I'll hold off on approving until 19:20 or so | 18:41 |
corvus | clarkb: lgtm | 18:41 |
clarkb | in that case I may as well apprive it now and see if we make it in | 18:43 |
clarkb | estimated merge time is 14 minutes so very likely will miss the 1900 hourlies. I'll keep an eye on 1900 and 2000 either way | 18:50 |
clarkb | 1900 looks fine and change still hasn't merged. | 19:06 |
opendevreview | Merged opendev/system-config master: Reapply "Bootstrap-bridge as top-level job" https://review.opendev.org/c/opendev/system-config/+/943414 | 19:09 |
clarkb | hrm ^ merging triggered the bootstrap bridge job while hourlies are still running. I don't anticipate that to be a problem (it should just update system-config) | 19:12 |
clarkb | but somethign to conisider for future updates that when we chagne the locking and exclusion mechanisms the old setup won't necessarily exclude the new setup so timing could be important | 19:12 |
clarkb | good news is that confirms the zuul_return fix: https://zuul.opendev.org/t/openstack/build/f16f61e8b8884b15a0339bfc947a798d/log/job-output.txt#322-323 | 19:13 |
clarkb | or at least it does when no other jobs are enqueued (in this case the job got enqueued because we modified its playbook) | 19:13 |
amorin | corvus: I will check with the team tomorrow about march 17 and let you know. | 19:21 |
clarkb | ~5 minutes to the second attempt at refactoring infra-prod job runs. I'm paying attention to it | 19:55 |
clarkb | jobs just enqueued | 20:01 |
clarkb | bootstrap bridge is running. service-bridge is waiting | 20:01 |
fungi | so far so good | 20:01 |
clarkb | yup next we should see bootstrap-bridge pause then service-bridge start | 20:02 |
clarkb | bootstrap paused | 20:03 |
clarkb | I think this is working | 20:03 |
clarkb | the next big question is how the daily periodic runs will handle it those enqueue at around 0200 | 20:05 |
fungi | perfect | 20:08 |
ianw | cool! didn't expect the job inheritance but makes sense | 20:09 |
clarkb | tomorrow if things still look good we can update the max on the second semaphore to 2 and see how things handle running in parallel | 20:10 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run infra-prod jobs in parallel https://review.opendev.org/c/opendev/system-config/+/943488 | 20:20 |
clarkb | there is the change for that. I don't think we should merge that until after periodic jobs today | 20:21 |
clarkb | https://zuul.opendev.org/t/openstack/build/5e269975631a4d1bbf140d059d03374a/log/job-output.txt#322-323 this shows the paused section and the unpause timestamp is after https://zuul.opendev.org/t/openstack/build/286bef65c181467e9ca95ce9d1fd2087 completes so I think that means we paused for the whole buildset runtime | 20:25 |
clarkb | infra-root I think the next things on my agenda are getting tonyb's forced ipv4 for docker change into shape, landing that then we should be able to update the selenium container image location and upgrade gitea to 1.23.5 | 20:41 |
clarkb | tonyb: not sure if you are still in transit or what your general hours are but I had a couple quetsions and one specific suggestion. Nothing major on that change but enough that a new patchset might be worthwhile | 20:41 |
tonyb | Cool beans. Just landed at LAX. I'll look when I can | 20:42 |
fungi | if things are mostly settled down i can start pushing through the flex project switchover changes this evening | 20:43 |
clarkb | ah ok in transit then. tonyb I can push the update I suggested if you prefer. Also not of this is super urgent. But I think your change should make everything else less flaky hence the order of operations | 20:43 |
fungi | i'll go ahead with 943065 and 943076 to wind down the old sjc3 project usage in both np and zl, then prroceed with 942231 once i confirm nodes and images are gone | 20:50 |
clarkb | sounds good | 20:50 |
fungi | i'll also delete the old mirror01.sjc3.raxflex since we haven't seen any issues using 02 | 20:55 |
opendevreview | Merged opendev/zuul-providers master: Reapply "Wind down/clean up Rackspace Flex SJC3 resources" https://review.opendev.org/c/opendev/zuul-providers/+/943076 | 20:56 |
fungi | server list shows the old mirror gone now | 21:01 |
clarkb | the next hourly jobs have started and when 943065 lands we should make sure that is all still happy | 21:02 |
clarkb | I think we actually need to update project-config to run infra-prod-bootstrap-bridge like we do in system-config | 21:03 |
opendevreview | Merged openstack/project-config master: Wind down/clean up Rackspace Flex SJC3 resources https://review.opendev.org/c/openstack/project-config/+/943065 | 21:04 |
clarkb | the secondary semaphore is preventing ^ from running the one job it has enqueued | 21:05 |
clarkb | so I think we're still safe though I'm not sure that we'll update the git repos properly because bootstrap bridge does that | 21:05 |
clarkb | fungi: just fyi that 943065 may not apply until the 2200 utc hourly jobs | 21:05 |
clarkb | and i'm working on a fix for that now | 21:05 |
clarkb | infra-root fyi we should avoid creating new projects in gerrit until we're happy with these pipeline changes. I think right now they won't necessarily do what we want | 21:09 |
opendevreview | Clark Boylan proposed openstack/project-config master: Use the new infra-prod locking and dependency setup https://review.opendev.org/c/openstack/project-config/+/943494 | 21:13 |
clarkb | infra-root ^ I think that is the fix. basicalyl weh ave to run infra-prod-bootstrap-bridge when we run any other infra-prod jobs and then set up the hard dependency from those other jobs to that job | 21:13 |
clarkb | I think we are currently ok as far as the way things are setup we just won't actually apply project-config updates until the related jobs either run in hourly or periodic runs or system-config change merges deploy them | 21:16 |
clarkb | so we should avoid merging any more project-config changes until 943494 or something like it lands | 21:17 |
clarkb | and then maybe merge a safer project-config change like a max-servers update for nodepoolor somethingto canary it after it lands | 21:17 |
clarkb | the deploy job for 943065 is running now. It is probably worht checking my earlier assumptions hold on nl01 and the builders | 21:21 |
clarkb | I'll wait for the job to finish then check nl01 | 21:21 |
clarkb | are there any other places we trigger infra-prod jobs? I don't think so | 21:23 |
clarkb | yup my assumption appears to have been correct. Looking at nl01's /etc/nodepool/nodepool.yaml the raxflex provider still has max-servers of 32 and the diskimages list is not empty | 21:27 |
clarkb | so ya waiting for hourly deployment of nodepool should correct that in ~45 minutes | 21:27 |
clarkb | and then 943494 will hopefully address the problem for future project-config updates | 21:27 |
fungi | i didn't see any others | 21:28 |
clarkb | 943494 passed check testing at least. I guess we can approve that either when I get back or tomorrow morning. Then land a project-config update to exercise it | 21:36 |
clarkb | or if you want to send it approving it now is fine too I just won't be able to monitor it. | 21:37 |
clarkb | worse case in any scenario we revert the system-config update again and rollout everything together next time | 21:37 |
fungi | i'm stepping back for a bit to start dinner prep, but should be around later while working through the flex changeover | 21:39 |
opendevreview | Stephen Reaves proposed openstack/diskimage-builder master: Enable custom overlays https://review.opendev.org/c/openstack/diskimage-builder/+/943500 | 22:01 |
tonyb | clarkb: go ahead and push it. I've made the changes locally but SSH is blocked on this wifi :/ | 22:17 |
opendevreview | Stephen Reaves proposed openstack/diskimage-builder master: Enable custom overlays https://review.opendev.org/c/openstack/diskimage-builder/+/943500 | 22:42 |
fungi | seeing that /etc/nodepool/nodepool.yaml hasn't updated on nl01 yet | 22:44 |
fungi | checking to see if the hourly run got delayed | 22:45 |
fungi | infra-prod-service-nodepool ran at 22:04:30 in opendev-prod-hourly: https://zuul.opendev.org/t/openstack/build/fcc4bece0766460bb7735c4297323a75 | 22:47 |
fungi | and infra-prod-bootstrap-bridge started before it in the same buildset | 22:48 |
Clark[m] | fungi: look at the bootstrap job logs in zuul it should sync system and project configs. But then check the git repos on bridge I guess? | 22:51 |
Clark[m] | I stopped to check grafana and noticed max servers was still 32... | 22:52 |
Clark[m] | Fwiw if you think we need to revert I'm ok with that. Still a way from home though | 22:52 |
fungi | ~/src/opendev.org/openstack/project-config on bridge is 407cb2a Normalize projects.yaml (Date: Fri Feb 28 02:25:48 2025 +0000) | 22:55 |
fungi | so still a week old | 22:55 |
fungi | that's ~zuul/src/opendev.org/openstack/project-config to be clear | 22:55 |
fungi | i'm not sure a revert is urgent, just trying to work out where in the bootstrap job it pushes an updated openstack/project-config state to bridge | 22:57 |
Clark[m] | I just remembered that nodepool Ansible renders it's config dynamically. Maybe that is related? | 22:57 |
Clark[m] | Were there other changes after February 28 we are missing? | 22:58 |
Clark[m] | Looks like no | 22:58 |
Clark[m] | fungi: https://zuul.opendev.org/t/openstack/build/f145c6af941c44f984dbbe90178526d1/log/job-output.txt#109 | 22:59 |
Clark[m] | That is where it should update but it decided not to for some reason | 23:00 |
fungi | the only condition i see for that task is when: '"opendev.org/openstack/project-config" in zuul.projects' | 23:03 |
fungi | bingo, it's not: https://zuul.opendev.org/t/openstack/build/f145c6af941c44f984dbbe90178526d1/log/zuul-info/inventory.yaml#112 | 23:04 |
fungi | do we need to add it to required-projects? | 23:06 |
fungi | since it won't be automatically present in the context of a timer trigger | 23:06 |
fungi | or perhaps just drop the when clause" | 23:13 |
fungi | s/"/?/ | 23:13 |
Clark[m] | I wonder if it's that way so that only project config merges can update it to avoid race conditions between system config and project config | 23:27 |
Clark[m] | If that is the case then my other fix is probably the way forward then just land some noopy change to trigger nodepool updates? | 23:27 |
Clark[m] | Maybe the git log or Gerrit review comments shed light on that | 23:28 |
Clark[m] | I suspect our two options are to roll forward with my other fix and then land a noopy nodepool config update or revert and land a noopy config update | 23:32 |
Clark[m] | If we revert we can pick this up tomorrow. I just don't want to be an impediment to your work while we sort out infra prod parallel execution | 23:32 |
fungi | the when condition came along with the project-config sync in https://review.opendev.org/c/opendev/base-jobs/+/862853 infra-prod: Move project-config reset into base-jobs | 23:33 |
fungi | commit message doesn't explain the reason for the condition, but it may have been present in the task it was copied from | 23:33 |
fungi | i'm fine rolling forward with your fix in 943494 | 23:34 |
fungi | went ahead and approved it | 23:34 |
Clark[m] | It's odd because we only run periodic in system config so that condition can never match | 23:34 |
Clark[m] | Ack. I'm just about back home and everything. So can help with further discovery in a bit | 23:36 |
opendevreview | Merged openstack/project-config master: Use the new infra-prod locking and dependency setup https://review.opendev.org/c/openstack/project-config/+/943494 | 23:45 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: A comment tweak to trigger nl01 config deployment https://review.opendev.org/c/openstack/project-config/+/943508 | 23:50 |
fungi | i'm self-approving that ^ | 23:50 |
clarkb | +2 from me. You were quicker getting that ready that I was ready to review it | 23:54 |
fungi | just trying to speed up the earlier config change taking effect since it'll take time for the nodes in use to die off | 23:55 |
clarkb | fungi: infra-prod-service-nodepool does have openstack/project-config in its required projects | 23:55 |
fungi | yeah, but the conditional was being evaluated in the bootstrap job | 23:56 |
clarkb | so now I'm extra confused. The changes we've made toady don't touch any of that and I wonder if this was some latent bug and we only ever updated project-oncifg when landing changes to it | 23:56 |
clarkb | fungi: oh! | 23:56 |
clarkb | now I'm up to speed | 23:56 |
clarkb | ok in that case I think we do want required projects for both system-config and project-config in bootstrap-bridge | 23:57 |
clarkb | because right now bootstrap bridge can only update one or the other | 23:57 |
fungi | makes sense | 23:57 |
clarkb | (depending on where the job is enqueued from) and this is a regression from before because before every project was updating git repos | 23:57 |
clarkb | there are four repos we rely on | 23:58 |
clarkb | I still think we are ok as long as we're careful to not merge too much stuff until things have resolved | 23:59 |
clarkb | and the basis for that is we shouldn't be rolling anything back due to these bugs. The bugs are all about rolling things forward when we expect to roll them forward | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!