opendevreview | Ian Wienand proposed opendev/glean master: Revert "Add option to ignore config drive interfaces info" https://review.opendev.org/c/opendev/glean/+/843225 | 01:15 |
---|---|---|
opendevreview | Ian Wienand proposed opendev/glean master: write_redhat_interfaces: refactor to walk interfaces first https://review.opendev.org/c/opendev/glean/+/843241 | 01:15 |
opendevreview | Ian Wienand proposed opendev/glean master: write_redhat_interfaces: pass multiple networks to output functions https://review.opendev.org/c/opendev/glean/+/843242 | 01:15 |
opendevreview | Ian Wienand proposed opendev/glean master: [wip] write out ipv6 https://review.opendev.org/c/opendev/glean/+/843243 | 01:15 |
opendevreview | Ian Wienand proposed opendev/glean master: Clean up TYPE=Ethernet handling https://review.opendev.org/c/opendev/glean/+/843353 | 01:15 |
ianw | oh i think i might have uploaded over fixups made :/ | 01:15 |
opendevreview | Ian Wienand proposed opendev/glean master: write_redhat_interfaces: pass multiple networks to output functions https://review.opendev.org/c/opendev/glean/+/843242 | 01:19 |
opendevreview | Ian Wienand proposed opendev/glean master: Clean up TYPE=Ethernet handling https://review.opendev.org/c/opendev/glean/+/843353 | 01:19 |
opendevreview | Ian Wienand proposed opendev/glean master: [wip] write out ipv6 https://review.opendev.org/c/opendev/glean/+/843243 | 01:19 |
ianw | i think that restored fungi's fix. i'm not sure why it decided to upload everything :/ | 01:21 |
*** rlandy|bbl is now known as rlandy | 01:23 | |
*** ysandeep|out is now known as ysandeep | 01:27 | |
*** rlandy is now known as rlandy|out | 01:31 | |
opendevreview | Ian Wienand proposed opendev/glean master: _network_info: Clean up TYPE=Ethernet handling https://review.opendev.org/c/opendev/glean/+/843353 | 02:45 |
opendevreview | Ian Wienand proposed opendev/glean master: [wip] write out ipv6 https://review.opendev.org/c/opendev/glean/+/843243 | 02:45 |
opendevreview | Ian Wienand proposed opendev/glean master: _network_info: refactor to add ipv4 info at the end https://review.opendev.org/c/opendev/glean/+/843367 | 02:45 |
*** ysandeep is now known as ysandeep|afk | 02:50 | |
opendevreview | Merged openstack/diskimage-builder master: Check and mount boot volume for data extraction with nouuid https://review.opendev.org/c/openstack/diskimage-builder/+/843297 | 03:55 |
*** diablo_rojo_phone is now known as Guest360 | 05:08 | |
*** mnasiadka_ is now known as mnasiadka | 05:08 | |
*** tkajinam_ is now known as tkajinam | 05:11 | |
*** TheMaster is now known as Unit193 | 05:58 | |
opendevreview | Ian Wienand proposed opendev/glean master: [wip] write out ipv6 https://review.opendev.org/c/opendev/glean/+/843243 | 07:22 |
opendevreview | Ian Wienand proposed opendev/glean master: _network_info: simplify to single string https://review.opendev.org/c/opendev/glean/+/843411 | 07:22 |
ianw | clarkb/fungi: ^ 843243 is writing out something like what i think the config files need to look like. probably the next step is to migrate that manually onto some nodes and see what happens on a real system | 07:23 |
*** ykarel_ is now known as ykarel | 07:33 | |
*** ysandeep|rover is now known as ysandeep|rover|lunch | 07:35 | |
*** ysandeep|rover|lunch is now known as ysandeep|rover | 09:23 | |
*** rlandy|out is now known as rlandy | 10:27 | |
*** dviroel_ is now known as dviroel | 11:17 | |
fungi | infra-root: i'm initiating the zuul_reboot playbook in a root screen session on bridge.o.o now. expect it to take 6+ hours to complete | 11:33 |
fungi | i'll keep an eye on it and make sure it doesn't go completely crazy | 11:33 |
fungi | image pulls are in progress now | 11:34 |
fungi | and now it's starting in on the executor stops | 11:34 |
fungi | oh, right, this isn't the 6+6 batching, it's one-by-one so will be waaaay longer than 6 hours | 11:35 |
fungi | possibly several days | 11:36 |
*** ysandeep|rover is now known as ysandeep|rover|break | 11:44 | |
opendevreview | sean mooney proposed opendev/bindep master: Add support for popos https://review.opendev.org/c/opendev/bindep/+/843444 | 12:20 |
opendevreview | sean mooney proposed opendev/bindep master: Add support for popos https://review.opendev.org/c/opendev/bindep/+/843444 | 12:25 |
*** ysandeep|rover|break is now known as ysandeep|rover | 12:36 | |
*** mnaser_ is now known as mnaser | 12:41 | |
mgariepy | hello, is it possible to hold `openstack-ansible-upgrade-infra_lxc-ubuntu-focal` from `837588,28` so i can check it>? | 13:14 |
mgariepy | Hostname: ubuntu-focal-ovh-gra1-0029790883 | 13:15 |
fungi | mgariepy: i can set an autohold for a combination of project+job+change yes, if that's still running then it will be held once it fails | 13:38 |
fungi | what problem are you investigating with that job? | 13:39 |
fungi | i'll add it to the hold comment | 13:39 |
fungi | failed: [aio1_keystone_container-c516e6a5 -> aio1_utility_container-4f947b4e(172.29.236.230)] (item=None) => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false} | 13:43 |
fungi | i guess it's that you want to know why that task is failing, but with no_log set on it you don't have any stdout/stderr to provide further clues? | 13:43 |
fungi | in theory you could make a separate do-not-merge change which removes the no_log from that task and make it depends-on 837588 | 13:44 |
fungi | (for openstack.osa.db_setup "Create database for service") | 13:45 |
fungi | anyway, i've set an autohold in case that helps: | 13:46 |
mgariepy | something is holding the socket in a container. | 13:46 |
fungi | zuul-client autohold --tenant=openstack --project=opendev.org/openstack/openstack-ansible-repo_server --job=openstack-ansible-upgrade-infra_lxc-ubuntu-focal --ref=refs/changes/88/837588/28 --reason="mgariepy investigating investigating database setup task failure masked by no_log" --count=1 | 13:46 |
mgariepy | thanks | 13:46 |
jrosser_ | fungi: it's suspected to be a race condition in swapping the xinetd galera loadbalancer check to one using a systemd socket activated service | 13:46 |
mgariepy | can you grab my key from gerrit ? | 13:46 |
fungi | not easily, but you can stick a copy on paste.opendev.org | 13:47 |
jrosser_ | the db task failure is just a side effect of the db being down from the perspective of the loadbalancer | 13:47 |
fungi | (a copy of th epublic key obviously, not the private key!) | 13:47 |
mgariepy | https://paste.openstack.org/show/bmEEcIcyQre3D8rn76hz/ | 13:49 |
mgariepy | fungi, yep i know how ssh works :) haha | 13:49 |
fungi | thanks! | 13:50 |
fungi | ze01 reboot appears to have happened successfully ~1.5 hours ago, and https://zuul.opendev.org/components reports it's running 6.0.1.dev34 b1311a590 while ze02 (5.2.6.dev33 a89ce345) is now paused for graceful stop | 14:00 |
fungi | clarkb: ^ | 14:00 |
fungi | so that was only an hour to restart ze01, not as bad as i expected | 14:02 |
fungi | 11:34z is when i saw the stopping begin for ze01, and last reports system boot at 12:34z | 14:03 |
fungi | ze02 seems to be taking longer to stop though | 14:03 |
*** artom_ is now known as artom | 14:04 | |
*** bhagyashris is now known as bhagyashris|ruck | 14:22 | |
*** ysandeep|rover is now known as ysandeep|dinner | 14:27 | |
*** ysandeep|dinner is now known as ysandeep | 14:50 | |
fungi | i've initiated a borg prune on backup02.ca-ymq-1.vexxhost.o.o since it's warning us about being at 90% capacity | 14:53 |
*** Guest360 is now known as diablo_rojo_phone | 15:00 | |
fungi | ze02 has been down to 1 build running for the past 40 minutes, so hopefully it will pop at any time | 15:06 |
fungi | and there it goes! | 15:08 |
fungi | so that was roughly 2.5 hours for ze02 | 15:09 |
mgariepy | the job has failed. what user should i use to log in ? | 15:12 |
fungi | i need to add your key to the node, just a moment | 15:15 |
mgariepy | cool | 15:15 |
mgariepy | :) | 15:15 |
fungi | mgariepy: ssh root@149.202.178.249 | 15:18 |
fungi | and let us know when you're done with the node so we can release the hold for it | 15:18 |
mgariepy | ok thanks | 15:18 |
clarkb | fungi: thank you for getting that started. And ya the idea is we could potentially run this continuously in the background which is why it is serial: 1. Thinking about that thought we might be able to increase that to serial: 2 for the executors and serial: 4 for the mergers? Starting conservative seemed safer though | 15:19 |
fungi | agreed, slow for now allows us to spot problems before they get as out of hand | 15:19 |
fungi | the "running builds" graph in the zuul-status dashboard on grafana provides a great view of the per-executor burndown progress | 15:28 |
fungi | as well as fairly accurate timing of each one at a glance | 15:29 |
fungi | given the starting point for build count on ze03 was slightly higher than for ze02, i'm guessing it will take roughly the same time or a little longer. hopefully no more than 3 hours | 15:30 |
clarkb | the wildcard in that is long paused jobs | 15:31 |
clarkb | confusingly they don't even show up as running ansible processes on the host if you do direct inspection as they are merely zuul stte. Its possible that paused job state could be moved into the db instead and we could have executors shutdown more quickly? | 15:31 |
opendevreview | Merged openstack/project-config master: elastic-recheck: allow releasers to merge/delete https://review.opendev.org/c/openstack/project-config/+/840455 | 15:33 |
opendevreview | Shnaidman Sagi (Sergey) proposed openstack/project-config master: Add ops to openstack-ansible-sig channel https://review.opendev.org/c/openstack/project-config/+/843492 | 15:33 |
fungi | oh, that's an interesting idea. i guess the down side is that if you have a paused job which is ready to be unpaused just after the graceful signal is set, it will have to wait for all the other active builds for other buildsets to complete | 15:34 |
fungi | #status log Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org bringing filesystem utilization down from 90% to 54% | 15:40 |
opendevstatus | fungi: finished logging | 15:40 |
clarkb | re the gerrit 3.6 upgrade oddity I mentioend yesterday: you can apparently run the command unline after upgrading to 3.5.2 or newer 3.5. Then only upgrade to 3.6 once that is complete. 3.5 will write new content in the format 3.6 wants. It is only the old content that needs to be forward ported | 15:43 |
clarkb | that shouldn't be too bad | 15:44 |
*** marios is now known as marios|out | 15:44 | |
fungi | yeah, i wondered if the migration path was something like that. very cool | 15:45 |
mgariepy | fungi, thanks you can remove the hold :) | 15:45 |
mgariepy | it did fail at some other tasks this time. | 15:45 |
mgariepy | i'll try to reproduce in a vm here instead. | 15:45 |
fungi | mgariepy: appreciated. do you need the autohold reset for another recheck of that patchset? | 15:45 |
mgariepy | hmm it would be easier. | 15:46 |
mgariepy | i'll recheck and it will auto-hold ? | 15:46 |
fungi | yes, i've set a new autohold now for that same change | 15:46 |
mgariepy | i just did order the recheck. | 15:47 |
fungi | if you keep rechecking it, then the next time it fails zuul will hold the node for that same job | 15:47 |
mgariepy | that's handy :D | 15:47 |
fungi | so if it succeeds, recheck again until it fails. we don't have to redo the autohold after successful runs | 15:47 |
fungi | only if it catches a build failure | 15:48 |
mgariepy | when it holds if i do recheck will it release the old instance? | 15:48 |
fungi | no, i manually released the old one by deleting the original autohold for it | 15:48 |
mgariepy | ok | 15:49 |
mgariepy | perfect. | 15:49 |
fungi | nodes are held indefinitely once caught by an autohold, until an admin deletes the autohold for it | 15:49 |
mgariepy | i'll check the status after lunch if it has failed i'll poke you to add my key. | 15:49 |
fungi | or until something happens to the server instance (they're in clouds, after all) | 15:49 |
fungi | perfect. i'll be around | 15:49 |
mgariepy | thanks | 15:49 |
fungi | yw | 15:49 |
clarkb | re setting serial to 2 for executors we're already running low on available executor job slots with serial 1 so that may not be a good idea | 15:52 |
fungi | yeah, agreed. this way at least it's not heavy impact even under load | 15:53 |
*** frenzyfriday|ruck is now known as frenzy_friday | 15:54 | |
fungi | looks like we had a bump in activity ~15 minutes ago and are now effectively maxxed out on node quota too | 15:54 |
fungi | the executors will presumably recover a bit once that spike settles out | 15:54 |
fungi | the executors can only start new builds when there are nodes available for them anyway, so if it's the builds-starting governor kicking in to take them out of accepting status, then that shouldn't last too long | 15:56 |
*** ysandeep is now known as ysandeep|out | 16:06 | |
fungi | clarkb: ianw (if you're not out friday): just a heads up that odds are the executor restarts will accelerate during our lower activity period around utc midnight, so there's a good chance the merger and scheduler/web restarts will happen late in my day or once i'm asleep | 16:09 |
clarkb | ya I'll try to continue to keep an eye on it | 16:10 |
fungi | thanks! | 16:10 |
clarkb | once the executors are done mergers should happen very quickly. Then we'll slow down with the scheduelrs again | 16:10 |
clarkb | I removed frickler's acl package install from the ansible 5 devstack test change and it seems to be happy so far implying our image updates did the trick. | 16:21 |
fungi | perfect | 16:22 |
clarkb | Are there other big classes of test job that we want to try and check before we start talking more broadly about updating our default? The tripleo jobs come to mind but I can't keep their setup straight enough to know what job to modify to get broad testing done and they are pretty in touch with ansible upstream so shouldn't be too much effort for them to figure out once we tell | 16:22 |
clarkb | everyone it is coming | 16:22 |
clarkb | But I'm thinking maybe email on ~Monday saying expect default ansible version in opendev zuul to be 5 across the board at the end of June | 16:23 |
fungi | that sounds like good timing. gives folks a chance to knock it out after post-summit downtime | 16:23 |
clarkb | yup and there is an easy way to override to 2.9 if necessary | 16:25 |
frickler | clarkb: thanks, for the devstack test, that's good news. I was planning to test kolla jobs, too | 16:30 |
clarkb | ++ | 16:31 |
fungi | clarkb: it came up a little while back as an open question, whether setuptools's experimental pyproject.toml metadata support works with pbr-using packages. i managed to get it going on a personal project: https://mudpy.org/gitweb?p=mudpy.git;a=commitdiff;h=e6f6c65d | 17:06 |
fungi | only problem was that i couldn't completely delete setup.cfg because pbr still consults it for the package name, apparently | 17:07 |
fungi | and setup.py of course, since we still have to set pbr=true there | 17:07 |
clarkb | neat | 17:07 |
clarkb | so I guess we're most of the way there for when/if that transition is forced on us | 17:08 |
fungi | yeah | 17:08 |
clarkb | though some repos may consider just dropping pbr since the tools do a lot of what it did | 17:08 |
fungi | well, they'd need to replace it with other similar setuptools plugins | 17:08 |
fungi | and for example if they use the semver handling in pbr that's not an available feature in the setuptools-scm plugin | 17:09 |
clarkb | right, though does anyone use that? | 17:11 |
clarkb | I do think a transition to upstream tools if they support the bulk of what is needed is a good sunset for pbr | 17:11 |
clarkb | pbr exists because the usptream tools didn't do a bunch of straightforward stuff tehy should do | 17:12 |
clarkb | ze03 has a very stubborn last job :) | 17:13 |
clarkb | kolla-ansible-centos8s-source-upgrade-ceph-ansible retries a lot and is non voting in check and does more than three attempts | 17:23 |
* clarkb is going to push a change to make that an experimental job for them instead | 17:24 | |
fungi | clarkb: the openstack release automation uses semver headers in all the service projects in order to force the master branch to start reporting development versions for the next semver major release immediately after each coordinated release | 17:25 |
clarkb | oh its on stable victoria too | 17:25 |
clarkb | fungi: oh neat | 17:25 |
clarkb | fungi: setuptools_scm does do versioning from git (and hg) I wonder if it can be convinced to report the new dev versions in a similar way | 17:26 |
clarkb | maybe by doing a second tag on a new commit that sets the ver as a dev ver or something | 17:27 |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Add a repository for the Large Scale SIG https://review.opendev.org/c/openstack/project-config/+/843534 | 17:29 |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Add a repository for the Large Scale SIG https://review.opendev.org/c/openstack/project-config/+/843534 | 17:31 |
fungi | clarkb: https://review.opendev.org/c/openstack/nova/+/833243 is an example release bot change | 17:34 |
fungi | i was wrong, it sets Sem-Ver: feature so that dev versions will appear to belong to the next minor (feature instead of api-break) | 17:35 |
clarkb | remote: https://review.opendev.org/c/openstack/kolla-ansible/+/843536 Cleanup zuul jobs a bit | 17:36 |
fungi | but still, same end result. it allows the project to differentiate dev commits on master from dev commits on the most recent stable branch (where dev versions will appear to belong to th enext patch level instead0 | 17:36 |
clarkb | fwiw I think we should encourage overrides for things like timeouts and attempts to happen more to the leaf end of the job trees than the trunk | 17:36 |
fungi | and it's not just openstack service projects, looks like all the libs do it too | 17:37 |
clarkb | its far to easy to forget things have been overridden then run a job 5 times before running success or failure | 17:37 |
clarkb | fungi: ya I'm wondering if setuptools scm's version management supports something similar | 17:37 |
clarkb | https://github.com/pypa/setuptools_scm/blob/main/src/setuptools_scm/version.py#L201-L203 it has stuff to do something along those lines | 17:38 |
clarkb | ze04 is paused now | 17:39 |
fungi | yep, looks like ze03 just restarted in the last few minutes | 17:39 |
clarkb | wow my change to kolla-ansible updates the attempts value in the base job whcih causes all the jobs that inherit from it to be run | 17:41 |
clarkb | there are a lot of those jobs and the vast majority are non voting. | 17:42 |
fungi | mgoddard: mnasiadka: ^ if you're around, any history on that? are those just recently broken? | 17:44 |
frickler | I was following the Project Creators Guide, adding the needed-by comment to the project-config patch and that trigger zuul to vote -1 on https://review.opendev.org/c/openstack/governance/+/843535 . can we come up with a better procedure or shall we just add a warning to the guide? | 17:45 |
clarkb | frickler: I dont' think the needed by comment is at fault. Zuul doesn't do anything with that information | 17:46 |
frickler | clarkb: it seems it does, because it is a new ps which causes the gov change testing to be canceled | 17:47 |
clarkb | the issue is that the governance change was in the zuul check queue when you pushed a new patchset to its depends on | 17:47 |
frickler | clarkb: yes, but that is exactly what our docs say one should do | 17:47 |
clarkb | ya its the generic behavior that pushing a new patchset to a depends on causes the child to be kicked out since it was running against code that can no longer be merged | 17:48 |
clarkb | frickler: sure I'm just trying to clarify that the text needed-by doesn't do that. its any new patchset | 17:48 |
clarkb | (depends-on affects how zuul runs, needed-by does not) | 17:48 |
frickler | yes, I know that, sorry if I worded things wrong. it is submitting the new ps with the needed-by comment that triggers this, not the content of the comment | 17:49 |
clarkb | I think we can update the docs to suggest adding a retry comment after the -1 | 17:49 |
clarkb | since it should reenqueue just fine and report normally as long as no new patchsets are pushed in that time | 17:50 |
fungi | or wait for the governance change to complete testing before revising the commit message on the project-config change | 17:50 |
fungi | or add the needed-by first (and use the change-id in it rather than a gerrit url) | 17:51 |
frickler | isn't using change-ids deprecated? | 17:51 |
fungi | for depends-on, yes. because depends-on is something zuul uses | 17:52 |
fungi | needed-by is just a hint to reviewers | 17:52 |
clarkb | another appraoch would be to use a consistent topic | 17:52 |
clarkb | rather than updating the commit message. Or leaving a comment on the chagnes to tie them togehter | 17:52 |
frickler | the guide also says to use "new-project" as topic | 17:52 |
clarkb | I think I like ^ best but woudl require the openstack tc to operate differently | 17:52 |
clarkb | ya so maybe if we set the same topic on both cahnges that is good enough for reviewers to find them | 17:53 |
fungi | the new-project topic was something we came up with ages ago to help us streamline reviewing project additions, but is probably not that relevant any longer | 17:54 |
clarkb | unless we wanted to use it now to tie those two sets of changes togther | 17:54 |
clarkb | rather than having the order of operations problem | 17:54 |
fungi | ooh, or a new use for gerrit's "hashtags" ;) | 17:54 |
* fungi is still looking for a nail to beat with that hammer | 17:55 | |
clarkb | because as you say it is just a hint to reviewrs so lets use gerrit's ability for that rather than artificially making new commits and making zuul unhappy | 17:55 |
frickler | o.k, so https://review.opendev.org/q/hashtag:new-project+(status:open%20OR%20status:merged) then | 17:56 |
frickler | and https://review.opendev.org/q/topic:large-scale-sig+status:open for the other conjunction | 17:57 |
fungi | i was thinking more like adding a hashtag to the project-config change which says where to find the corresponding governance change | 17:58 |
fungi | but also, could just use a basic review comment | 17:58 |
clarkb | ya may be asking people to leave comment with a link to the governance change is simplest | 17:58 |
clarkb | I'd be happy with that | 17:58 |
fungi | same. i always look at existing comments when i'm reviewing changes anyway | 17:58 |
clarkb | "Leave a comment with a link to the openstack governance change once that is pushed" | 18:02 |
clarkb | something simple like that | 18:02 |
opendevreview | Merged openstack/diskimage-builder master: Ensure passwd is installed on RH and derivatives https://review.opendev.org/c/openstack/diskimage-builder/+/840352 | 18:03 |
frickler | "Leave a review comment ..." otherwise I fear it still might be read as adding a comment to the commit message | 18:03 |
clarkb | ++ | 18:03 |
frickler | I can do a patch for that tomorrow unless one of you wants to do it right away | 18:04 |
clarkb | I think it is fine to do tomorrow. New projects don't happen every day | 18:04 |
fungi | yeah, that sounds like a great improvement, thanks frickler! | 18:06 |
clarkb | oh hey we have jammy wheels now | 18:27 |
mgariepy | fungi, is the hold good also for a gate check ? | 18:57 |
fungi | mgariepy: yes, it's independent of pipeline | 18:58 |
mgariepy | ok | 18:58 |
mgariepy | i'll see if it merge now if not, i'll debug it there. | 18:59 |
fungi | good luck! | 18:59 |
mgariepy | fungi, can you give me access to the vm :D | 19:22 |
fungi | on it | 19:24 |
fungi | mgariepy: ssh root@23.253.56.198 | 19:26 |
mgariepy | i'm in it :D | 19:27 |
fungi | ze04 rebooted so we're on to ze05 now | 19:46 |
clarkb | it is interesting to see the job counts fall off on the grafana graphs as the reboots happen | 20:05 |
corvus | it's the zuul wave | 20:06 |
BlaisePabon[m] | Late breaking personal news... I talked my management into transferring me fro Product Management to become the first DevOps SRE. | 20:31 |
BlaisePabon[m] | My first project is to configure zuul-ci and Gerrit for our developers. | 20:31 |
fungi | congrats! | 20:34 |
BlaisePabon[m] | I'm both elated and terrified. I have set up CI toolchains with Jenkins and maven, but zero `zuul-ci` experience. | 20:35 |
fungi | if you're curious how we do it, you can find our deployment playbooks and docker-compose files in https://opendev.org/system-config | 20:35 |
fungi | er, https://opendev.org/opendev/system-config | 20:35 |
BlaisePabon[m] | I'm both elated and terrified. I have set up CI toolchains with Jenkins and maven, but zero zuul-ci experience. | 20:36 |
BlaisePabon[m] | Thank you fungi , that means more to me than you can imagine!!! | 20:36 |
fungi | also https://zuul.opendev.org/ is a real production deployment you can browse around and check out | 20:36 |
fungi | and we have a zuul-status dashboard at https://grafana.opendev.org/ with lots of stats trended for the deployment | 20:37 |
BlaisePabon[m] | I'll look later ... my opendev account is my personal account. | 20:37 |
fungi | well, all of that is anonymously accessible too, no need to log into anything just to look around | 20:38 |
BlaisePabon[m] | Oh! Never mind, I can read this from here. | 20:38 |
fungi | https://zuul.opendev.org/components gives a nice overview of all the services in our zuul deployment. you can see there that we're in the middle of a rolling update now | 20:40 |
mgariepy | fungi i think you can release the hold now thanks for your hlep | 20:44 |
mgariepy | help** | 20:44 |
fungi | mgariepy: done, and you're welcome! | 20:49 |
fungi | did you manage to work out what was conflicting on the socket? | 20:51 |
mgariepy | we do restart the container and the socket needs an After=network.target. | 20:55 |
mgariepy | https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/843547 | 20:56 |
*** dviroel is now known as dviroel|out | 21:00 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Perform package upgrades prior to zuul cluster node reboots https://review.opendev.org/c/opendev/system-config/+/843549 | 21:06 |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Fix propose-updates job for requirements (3rd attempt) https://review.opendev.org/c/openstack/project-config/+/843550 | 21:06 |
frickler | clarkb: fungi: ^^ another round, if you could approve (feel free to modify if needed) before the next periodic trigger kicks off, so we get further testing, that'd be great | 21:08 |
frickler | I didn't make progress yet on the manual trigger task | 21:08 |
clarkb | looking | 21:09 |
*** rlandy is now known as rlandy|biab | 21:23 | |
fungi | mgariepy: ah, so it wasn't a socket conflict, it was just starting too early? | 21:40 |
opendevreview | Merged openstack/project-config master: Fix propose-updates job for requirements (3rd attempt) https://review.opendev.org/c/openstack/project-config/+/843550 | 21:48 |
mgariepy | yep indeed, | 21:59 |
mgariepy | sometimes it was ok tho .. | 21:59 |
fungi | ze05 has rebooted and ze06 is paused now | 22:21 |
clarkb | ianw: not sure if you caught it but gerrit 3.4.5 and 3.5.2 exist now https://review.opendev.org/c/opendev/system-config/+/843298 will bump us up to those versions in our images. The 3.5.2 or newer release is important before we upgraded to 3.6 in order to be able to run a new notedb curation command prior to 3.6 upgrade happening. | 22:32 |
fungi | now ze07 is in progress | 23:46 |
Clark[m] | I suspect it may accelerate now as there are fewer jobs which means fewer opportunities for a 3 hour long job to hold us up | 23:49 |
corvus | i think there may be a bug in the graceful shutdown of mergers; so watch out for that and if we get stuck on mergers, just manually 'docker-compose down' them to get it moving again | 23:52 |
corvus | i'll try to look at the merger graceful bug soon | 23:52 |
Clark[m] | Specifically the issue should cause this playbook to stall but not exit/fail. It's the merger process not exiting cleanly right? So ya a docker-compose down out of band would force that to happen allowing the playbook to continue | 23:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!