| @mnasiadka:matrix.org | > <@clarkb:matrix.org> mnasiadka: ok posted some thoughts to https://review.opendev.org/c/opendev/system-config/+/988310/ In particular I wonder if we can run prometheus and greptimedb on the same host and simplify things a bit? | 04:53 |
|---|---|---|
| I think that should be possible, I’ll rework the patches | ||
| -@gerrit:opendev.org- Zuul merged on behalf of Jack Hodgkiss: [openstack/diskimage-builder] 986427: fix: add support for `cloud-init` in `Ubuntu Resolute` https://review.opendev.org/c/openstack/diskimage-builder/+/986427 | 06:39 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed on behalf of Mohammed Naser: [opendev/system-config] 980840: Add Prometheus monitoring service https://review.opendev.org/c/opendev/system-config/+/980840 | 08:39 | |
| -@gerrit:opendev.org- Sylvain Bauza proposed: [opendev/system-config] 988406: Add bots to #openstack-agentic-worfklows https://review.opendev.org/c/opendev/system-config/+/988406 | 09:10 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/988310 | 09:17 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/988310 | 09:21 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/988310 | 09:22 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/988310 | 09:23 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/988310 | 09:44 | |
| -@gerrit:opendev.org- Mohammed Naser proposed: [opendev/system-config] 980994: Deploy node_exporter across all managed hosts https://review.opendev.org/c/opendev/system-config/+/980994 | 09:44 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/988310 | 09:44 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed on behalf of Mohammed Naser: [opendev/system-config] 980840: Add Prometheus monitoring service https://review.opendev.org/c/opendev/system-config/+/980840 | 09:50 | |
| -@gerrit:opendev.org- Elod Illes proposed: [openstack/project-config] 988430: [release-tools] Fix dist_name fetch for upper bump https://review.opendev.org/c/openstack/project-config/+/988430 | 11:23 | |
| -@gerrit:opendev.org- Mohammed Naser proposed: [opendev/system-config] 980994: Deploy node_exporter across all managed hosts https://review.opendev.org/c/opendev/system-config/+/980994 | 13:27 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/988310 | 13:27 | |
| @mnasiadka:matrix.org | Clark: looking at your comments on node_exporter patch - and I'm leaning towards writing a local role maybe? | 14:20 |
| @clarkb:matrix.org | mnasiadka: a local role to manage the node exporter config that builds atop the upstream stuff? I think that makes sense | 14:50 |
| @clarkb:matrix.org | basically a concrete way of encoding how we want node_exporter to run? | 14:51 |
| @clarkb:matrix.org | infra-root I think ansible upgrade on bridge (https://review.opendev.org/c/opendev/system-config/+/976282/) is still a potential go for today. However, I forgot that there is a parent change to make borg testing happy on older systems which has not had the same level of review | 14:51 |
| @clarkb:matrix.org | infra-root maybe we can land that first change nowish if there are people able to review it. Confirm that borg installs still look good then depending on timing and what else comes up in the meantime proceed with the ansible 9 change? | 14:52 |
| @mnasiadka:matrix.org | Clark: yeah, we could even run it as a container as well, Kolla does that and mounts / as /host inside the container read only - https://opendev.org/openstack/kolla-ansible/src/commit/8510b763fec5fbfdcd677177809c682eb69e97dc/ansible/roles/prometheus-node-exporters/defaults/main.yml#L42 | 14:52 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/diskimage-builder] 988472: CI: Switch dib-devstack to use openstack.cloud https://review.opendev.org/c/openstack/diskimage-builder/+/988472 | 14:55 | |
| @fungicide:matrix.org | i've approved the borg focal fix (976286) now | 15:59 |
| @fungicide:matrix.org | also +2 on ansible 9 for bridge, but didn't approve yet in case we want to coordinate hands available | 15:59 |
| @fungicide:matrix.org | as for the resolute mirror, the mirror.ubuntu volume is currently taking up 986gb for 3 releases so we'll want at least 400gb of free quota on that volume (i recommend we increase from 1.3tb to 1.5 just to be safe) | 16:02 |
| @fungicide:matrix.org | and mirror.ubuntu-ports is at 1.01tb used so we ought to raise it to 1.6tb at least | 16:03 |
| @clarkb:matrix.org | I think a quick check on the focal borg update first is a good idea just so that we don't break ansible if we need to address borg first | 16:05 |
| @fungicide:matrix.org | assume we're looking at ~900gb of additional space resolute is going to take up between the two of those, right now afs01.dfw is at 80.2% used (4.24/5.28tb), we're looking at landing around 97% full on /vicepa | 16:06 |
| @clarkb:matrix.org | and then we may also want to move the old ansible venv aside as that deploys so that we can restore it quickly if necessary. But I think checking borg first then proceeding is a good plan | 16:06 |
| @clarkb:matrix.org | fungi: not the change I proposed is only for x86_64 and we only have x86_64 images in zuul currently | 16:06 |
| @clarkb:matrix.org | * fungi: note the change I proposed is only for x86\_64 and we only have x86\_64 images in zuul currently | 16:07 |
| @fungicide:matrix.org | given this is all napkin math already, 3% headroom doesn't feel safe to me. i expect my estimates are already +/- more than that on the error bars | 16:07 |
| @clarkb:matrix.org | agreed, but I also think we're avoiding arm64 for now | 16:07 |
| @fungicide:matrix.org | yeah, if we go with just amd64 and no arm64 yet, that's going to be 89% utilization on afs01.dfw /vicepa | 16:08 |
| @fungicide:matrix.org | still a bit uncomfortable, but doable i think | 16:08 |
| @clarkb:matrix.org | in particular arm64 gets much less utilization so I think it is less of a priority | 16:08 |
| @clarkb:matrix.org | we can sort arm out as a secondary step | 16:09 |
| @fungicide:matrix.org | we'll need to clear more space or add more cinder volumes before we can talk about mirroring anything else large (including resolute arm64) | 16:09 |
| -@gerrit:opendev.org- Zuul merged on behalf of Goutham Pacha Ravi: [openstack/project-config] 987313: Add devstack-plugin-lustre project https://review.opendev.org/c/openstack/project-config/+/987313 | 16:20 | |
| @fungicide:matrix.org | Clark: okay, so are you in favor of me increasing the mirror.ubuntu quota from 1.3tb to 1.5tb and then approving 988279 and 988280? (mirror.deb-docker has plenty of room, current packages aren't even using 1% of its quota anyway) | 16:25 |
| @clarkb:matrix.org | fungi: I think so. Do we want to coordinate that around the ansible 9 update at all? | 16:26 |
| @clarkb:matrix.org | and yes the docker update seemed super safe | 16:26 |
| @fungicide:matrix.org | i don't expect the bridge ansible upgrade to impact the mirror setup. i'll probably need to run reprepro manually under screen anyway in order to avoid timeouts | 16:30 |
| @fungicide:matrix.org | worst case, bridge ansible is too broken to deploy the mirror addition | 16:31 |
| @fungicide:matrix.org | and then we work around that or wait until bridge is back on track | 16:31 |
| -@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 976286: Pin llfuse during borg install on Focal https://review.opendev.org/c/opendev/system-config/+/976286 | 16:32 | |
| @clarkb:matrix.org | fungi: ack wfm then. | 16:32 |
| @clarkb:matrix.org | fungi: re bridge the venv exists at `/usr/ansible-venv` I'm thinking that I can cp -r /usr/ansible-venv /usr/ansible-venv.8 or similar? | 16:33 |
| @clarkb:matrix.org | fungi: so that we can easily move it back to /usr/ansible-venv if the new version breaks for any reason? | 16:33 |
| @clarkb:matrix.org | (mostly worried about an unexpected chicken and egg that might need ansible to fix ansible) | 16:34 |
| @fungicide:matrix.org | #status log Increased AFS quota for the mirror.ubuntu volume by 200GB (from 1.3TB to 1.5TB) in order to make room for Ubuntu 26.04 LTS (Resolute) packages | 16:36 |
| @status:opendev.org | @fungicide:matrix.org: finished logging | 16:36 |
| @clarkb:matrix.org | ok the borg backup job just succeeded. So my next step is to "backup" the ansible venv and then we can approve the upgrade for ansible | 16:37 |
| @clarkb:matrix.org | fungi: does that cp -r seems reasonable. I think that should work well enough for this use case as its not crossing host or fs boudnaries | 16:37 |
| @fungicide:matrix.org | Clark: `cp -r` is probably fine, but you might want `cp -a` instead | 16:38 |
| @clarkb:matrix.org | oh yup preserving links is probably important for a venv | 16:38 |
| @fungicide:matrix.org | also you'll keep the same file modes, ownership, and timestamps | 16:39 |
| @clarkb:matrix.org | ok running `cp -a /usr/ansible-venv /usr/ansible-venv.8` now | 16:40 |
| @clarkb:matrix.org | that is done and a quick glance at the contents of the new dir lgtm | 16:41 |
| @fungicide:matrix.org | yeah, lgtm too | 16:42 |
| @clarkb:matrix.org | fungi: anything else you can think of before approve https://review.opendev.org/c/opendev/system-config/+/976282 ? | 16:42 |
| @clarkb:matrix.org | * fungi: anything else you can think of before we approve https://review.opendev.org/c/opendev/system-config/+/976282 ? | 16:43 |
| @clarkb:matrix.org | my typing today is terrible | 16:43 |
| @fungicide:matrix.org | nope, i went ahead and approved it now | 16:45 |
| @fungicide:matrix.org | also the resolute mirror additions | 16:45 |
| @fungicide:matrix.org | all three are in the gate | 16:45 |
| @fungicide:matrix.org | zuul is estimating they'll merge around 17:20 utc, ~25 minutes from now | 16:56 |
| @fungicide:matrix.org | that should be enough time that the hourly deploys will already have wrapped up | 16:56 |
| @clarkb:matrix.org | Perfect | 16:57 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/diskimage-builder] 988472: CI: Switch dib-devstack to use openstack.cloud https://review.opendev.org/c/openstack/diskimage-builder/+/988472 | 17:17 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 988279: Mirror Ubuntu Resolute Docker packages https://review.opendev.org/c/opendev/system-config/+/988279 | 17:18 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: | 17:18 | |
| - [opendev/system-config] 988280: Mirror Ubuntu Resolute Packages https://review.opendev.org/c/opendev/system-config/+/988280 | ||
| - [opendev/system-config] 976282: Upgrade Ansible on Bridge to Ansible 9 https://review.opendev.org/c/opendev/system-config/+/976282 | ||
| @clarkb:matrix.org | deployment jobs are meandering through bridge now | 17:22 |
| @clarkb:matrix.org | fungi: ^ not sure if you think you need to hold the mirror lock now? | 17:23 |
| @fungicide:matrix.org | nah, the ubuntu mirror starts at 15 after even hours, so it'll be another 50 minutes | 17:26 |
| @clarkb:matrix.org | due to the way things deploy I think we are already using ansible 9 fwiw | 17:26 |
| @fungicide:matrix.org | i'll just wait for the deploys to complete first | 17:26 |
| @clarkb:matrix.org | and the first buildset for docker package mirror updates succeeded so that is a good sign | 17:26 |
| @clarkb:matrix.org | the 1800 UTC hourly run will be the next real test (since the next two buildsets won't exercise much more than what is already exercised I think) | 17:27 |
| @fungicide:matrix.org | infra-prod-bootstrap-bridge is finally running | 17:31 |
| @clarkb:matrix.org | yup though it is a noop at this point as I think the prior two buildsets already updated ansible (per pip freeze in the venv) | 17:32 |
| @fungicide:matrix.org | and succeeded | 17:32 |
| @fungicide:matrix.org | oh, indeed, so we likely ran at least one of them with 9 | 17:33 |
| @clarkb:matrix.org | I think the bridge bootstrapping uses master rather than the change state (which may be a bug I guess) | 17:33 |
| @clarkb:matrix.org | yup exactly. I think this is looking good from early results. 1800 hourlies will be the next test | 17:33 |
| @fungicide:matrix.org | i'll go ahead and do the reprepro manual run now | 17:34 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/diskimage-builder] 988472: CI: Switch dib-devstack to use openstack.cloud https://review.opendev.org/c/openstack/diskimage-builder/+/988472 | 17:36 | |
| @fungicide:matrix.org | oh, huh somehow i missed that the mirror.ubuntu volume has been stale for a month? that one didn't run out of quota though | 17:39 |
| @fungicide:matrix.org | left behind a stale /afs/.openstack.org/mirror/ubuntu/db/lockfile so something killed the reprepro process, likely a point release caused it to timeout | 17:39 |
| @clarkb:matrix.org | oh should we put the mirror in the emergency file and temporarily remove resolute while we address that first? | 17:41 |
| @fungicide:matrix.org | while i probably should have done that, it's running now | 17:42 |
| @fungicide:matrix.org | i'll just keep an eye on it for the next however many days it takes to complete | 17:42 |
| @clarkb:matrix.org | ack | 17:42 |
| @fungicide:matrix.org | it's running under a root screen session on mirror-update, if anyone else needs to check on it | 17:43 |
| @clarkb:matrix.org | 1800 hourlies have begun. I'll keep an eye on them | 18:02 |
| @fungicide:matrix.org | no failures so far, at least | 18:04 |
| @fungicide:matrix.org | all successful: https://zuul.opendev.org/t/openstack/buildset/a4335d0211804e2f9303d7bd53d87fa1 | 18:09 |
| @clarkb:matrix.org | great, the real test will be daily runs early tomorrow UTC time. But this is all pointing in the right direction. I guess if you see any problems with system-cofnig-run jobs says something as well as those may catch unexpected things early | 18:10 |
| @scott.little:matrix.org | has annything changed at opendev that might be blocking my ability to push a signed tag? I'm still a member of the starlingx release group. The repo is starlingx/root.git ... git push gerrit vf/gazpacho ... remote: error: branch refs/tags/vf/gazpacho: | 18:28 |
| remote: use a SHA1 visible to you, or get update permission on the ref | ||
| remote: User: slittle1 | ||
| @scott.little:matrix.org | never mind, i see the problem | 18:29 |
| @clarkb:matrix.org | For completeness the most recent change thatay have impacted that was the Gerrit 3.12 upgrade last month | 18:30 |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 988535: Drop graphite port 80 redirect https://review.opendev.org/c/opendev/system-config/+/988535 | 19:22 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed wip: [openstack/diskimage-builder] 988472: CI: Switch dib-devstack to use openstack.cloud https://review.opendev.org/c/openstack/diskimage-builder/+/988472 | 19:43 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 988535: Drop graphite port 80 redirect https://review.opendev.org/c/opendev/system-config/+/988535 | 20:25 | |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 893571: DNM Forced fail on Gerrit to test 3.13 upgrades and downgrades https://review.opendev.org/c/opendev/system-config/+/893571 | 23:12 | |
| @clarkb:matrix.org | I cleared out my autoholds for etherpad 2.7.3 and gitea 1.26.1 and replaced them with a gerrit 3.12 autohold to test the upgrade process to 3.13 | 23:13 |
| @clarkb:matrix.org | I still want to upgrade to 3.12.7 on Friday if possible, but this way I can start testnig the upgrade process tomorrow hopefully and get a head start | 23:13 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!