opendevreview | Clark Boylan proposed opendev/system-config master: Cleanup users launch-node.py might have used https://review.opendev.org/c/opendev/system-config/+/816771 | 00:02 |
---|---|---|
opendevreview | Clark Boylan proposed opendev/system-config master: Remove python3-nova-agent package from our servers https://review.opendev.org/c/opendev/system-config/+/816772 | 00:02 |
opendevreview | Merged opendev/system-config master: Don't set lodgeit db dir perms https://review.opendev.org/c/opendev/system-config/+/816754 | 00:04 |
ianw | excellent, the 9-stream build fails with some sort of yaml library exception : TypeError: load() missing 1 required positional argument: 'Loader' | 00:19 |
ianw | this must be pyyaml 6 | 00:21 |
opendevreview | Ian Wienand proposed openstack/project-config master: nodepool elements: use yaml.safe_load https://review.opendev.org/c/openstack/project-config/+/816774 | 00:26 |
ianw | ^ this explains why it passed dib/nodepool gate | 00:26 |
Clark[m] | Ya you need to explicitly opt into unsafe now | 00:26 |
*** odyssey4me is now known as Guest4951 | 00:56 | |
opendevreview | Merged openstack/project-config master: nodepool elements: use yaml.safe_load https://review.opendev.org/c/openstack/project-config/+/816774 | 01:08 |
ianw | sigh, now another problem | 02:13 |
ianw | 2021-11-05 01:51:07.189 | Updating cache of https://opendev.org/openstack/openstack.git in /opt/dib_cache/source-repositories/openstack_179b61797588a5983c2f97c6533dca570c8f887d with ref * | 02:13 |
ianw | 2021-11-05 01:51:13.993 | Could not access submodule 'adjutant' | 02:13 |
ianw | 2021-11-05 01:51:13.993 | Could not access submodule 'ansible-hardening' | 02:13 |
ianw | 2021-11-05 01:51:13.993 | Could not access submodule 'ansible-role-collect-logs' | 02:13 |
ianw | ... and so on ... | 02:13 |
ianw | ... is dib broken, something about a new git, gitea, or this repo ... | 02:13 |
*** sshnaidm is now known as sshnaidm|off | 03:06 | |
Clark[m] | Is new git trying to auto update submodules? | 03:19 |
ianw | https://nb01.opendev.org/ubuntu-bionic-0000221037.log failed with this error | 03:22 |
ianw | https://nb01.opendev.org/ubuntu-focal-0000119628.log was the next build and passed | 03:23 |
Clark[m] | The urls are relative so those updates should work however I wouldn't expect the caching step to actually need to fetch the submodule content | 03:25 |
Clark[m] | Re new git I wonder if that is a bullseye hit behavior update where it actually fetches those? | 03:26 |
ianw | https://nb02.opendev.org/debian-stretch-0000051795.log also failed | 03:26 |
Clark[m] | Since source-repositories runs outside the chroot it would be the bullseye hit? | 03:26 |
Clark[m] | Honestly that repo isn't used for anything that I know of and is incomplt these days. We might get away with removing it from caching temporarily if necessary | 03:27 |
ianw | https://nb02.opendev.org/debian-buster-0000061362.log was the build that followed that, and that passed | 03:27 |
Clark[m] | *it is incomplete | 03:28 |
ianw | it seems like possibly running the new git failed once -- may have done something?! -- and further runs are working | 03:28 |
Clark[m] | ianw: maybe we need to see if it affects those older systems specifically? Then my hunch about the chroot is likely wrong | 03:28 |
Clark[m] | Oh I see. Ya maybe something about git repo state on the first pass then git doesn't try again? | 03:29 |
ianw | this would be running with the build-system git, not the guest git | 03:29 |
ianw | i can't seem to replicate it -- but also the git command-lines run by the caching are ... let's say not a priori obvious | 03:30 |
ianw | in short, it happened, twice, on different servers, i don't know why and it doesn't seem to be happening now | 03:31 |
opendevreview | Ian Wienand proposed opendev/system-config master: gerrit: mark file reviewed during testing https://review.opendev.org/c/opendev/system-config/+/816766 | 03:44 |
*** frenzyfriday|sick is now known as frenzy_friday | 04:25 | |
ianw | it just happened again with https://nb01.opendev.org/centos-7-0000237899.log | 04:55 |
ianw | the list of submodules is different now | 04:55 |
ianw | 2021-11-05 04:50:44.107 | Updating cache of https://opendev.org/openstack/openstack.git in /opt/dib_cache/source-repositories/openstack_179b61797588a5983c2f97c6533dca570c8f887d with ref * | 04:55 |
ianw | 2021-11-05 04:50:48.309 | Could not access submodule 'freezer-tempest-plugin' | 04:55 |
ianw | 2021-11-05 04:50:48.309 | Could not access submodule 'python-ironicclient' | 04:55 |
ianw | i'm just about out of time for today, i'm not going to get to dig on this much further | 04:56 |
opendevreview | Ian Wienand proposed opendev/system-config master: gerrit: mark file reviewed during testing https://review.opendev.org/c/opendev/system-config/+/816766 | 05:08 |
opendevreview | Ian Wienand proposed openstack/project-config master: infra-package-needs: skip haveged start on 9-stream https://review.opendev.org/c/openstack/project-config/+/816782 | 06:41 |
opendevreview | Merged openstack/project-config master: infra-package-needs: skip haveged start on 9-stream https://review.opendev.org/c/openstack/project-config/+/816782 | 07:00 |
opendevreview | Merged opendev/system-config master: gerrit: don't chown mariadb container directory https://review.opendev.org/c/opendev/system-config/+/816750 | 09:25 |
opendevreview | Alfredo Moralejo proposed openstack/project-config master: Fix haveged installation in CentOS7 https://review.opendev.org/c/openstack/project-config/+/816813 | 10:07 |
*** jpena|off is now known as jpena | 10:36 | |
*** dviroel|out is now known as dviroel|rover | 10:38 | |
*** jssfr is now known as foorl | 11:33 | |
*** mazzy50981 is now known as mazzy5098 | 12:00 | |
fungi | Clark[m]: ianw: i agree, the openstack/openstack repo is unnecessary to cache, i'd be in favor of filtering it out of the repos list explicitly | 13:45 |
opendevreview | Andre Aranha proposed zuul/zuul-jobs master: Add fips version of jobs needed for OpenStack https://review.opendev.org/c/zuul/zuul-jobs/+/816385 | 14:18 |
clarkb | fungi: re the user changes in https://review.opendev.org/c/opendev/system-config/+/816769/1/playbooks/roles/gerritbot/tasks/main.yaml we are defaulting to uid 11000 with the idea that other bot users could be 11001 etc or 12000 and so on. What I'm not sure about and questioning now is if we install a distro package that creates a new system user will it create that as uid 11001 after | 14:23 |
clarkb | we create that user? | 14:23 |
fungi | it will if 11000 is in the range adduser is willing to use | 14:24 |
fungi | on review.o.o, adduser.conf lists LAST_UID=59999 and LAST_GID=59999 so, yes it will assume no new users should be created lower than 11000 if there's a 11000 in passwd/groups | 14:25 |
clarkb | fungi: a better example might be zk04.opendev.org. Since that has zk running as 10001 (I used it as an example here) | 14:26 |
fungi | at one point we were setting FIRST_UID and FIRST_GID higher than 1000 i thought, but doesn't look like it now | 14:26 |
clarkb | so Igess the next question I have is this a problem and do we need to fix the existing zuul/nodepool/zk uid assignments | 14:27 |
clarkb | fungi: I suppose as an alternative I can create it as a non system user? | 14:28 |
fungi | it can become a problem if we rsync files or detach/attach a cinder volume during server replacements | 14:28 |
clarkb | then system packages will continue to use their normal range and we already manage our actual users with specific uids so that won't conflict? | 14:28 |
fungi | not sure what you mean by non system user | 14:29 |
clarkb | fungi: a regular user eg system: false/no in that ansible task | 14:30 |
fungi | if you're talking about adduser --system that normally causes it to pick a uid/gid from the "system" range rather than the normal user range | 14:30 |
clarkb | fungi: yes I know. We have created the zuul/nodepool/zookeeper users with a high (10001) uid and set it as a system user in ansible | 14:31 |
clarkb | I'm wondering if it would be better to not create these users as system users so that system packages can continue to pick from the normal range | 14:31 |
clarkb | normal range for system users I mean | 14:31 |
clarkb | then our non system users can have hardcoded uids and since we manage those directly we can keep them from getting too out of sync? | 14:32 |
fungi | i still don't understand. if we explicitly picked a uid/gid outside the "system" range then it's not really a system user anyway and "system" user uids/gids picked by the package maintscripts will be unaffected anyway | 14:33 |
clarkb | fungi: ok I don't know what the system: yes in the ansible that I cargo culted from the zookeeper ansible will dothen | 14:35 |
clarkb | fungi: is there no other flag for system vs not? | 14:35 |
fungi | not sure what you mean by flag | 14:35 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/816769/1/playbooks/roles/gerritbot/tasks/main.yaml line 14 | 14:36 |
fungi | the adduser manpage explains what adduser --system does, there are behaviors it switches besides just which uid/gid range is used | 14:36 |
clarkb | maybe that is a noop if we set our own uids and gids | 14:36 |
clarkb | fungi: basically what I'm saying is we already do this for zuul/nodepool/zk. If this is a problem we not only need to rethink my gerritbot change but those systems as well potentially | 14:38 |
clarkb | but I'm not yet sure if there is a problem with what I have proposed | 14:39 |
clarkb | it just occurred to me overnight that there may be and it was worth considering | 14:39 |
fungi | we went through this some years ago when we were still using puppet, and concluded that the sanest option was to create a gap between LAST_SYSTEM_UID and FIRST_UID where we could create our static uids | 14:40 |
fungi | another alternative is to pick uids/gids strictly greater than LAST_UID and LAST_GID (59999 looks like) | 14:41 |
opendevreview | Alfredo Moralejo proposed opendev/system-config master: Add CentOS Stream 9 to AFS mirrors sync https://review.opendev.org/c/opendev/system-config/+/816852 | 14:42 |
clarkb | fungi: we do set UID_MIN etc in opendev/system-config/playbooks/roles/base/users/files/Debian/login.defs | 15:15 |
clarkb | I think that was the conclusion of the puppet stuff and it got ported to ansible | 15:15 |
fungi | oh, it's just not in adduser.conf | 15:16 |
clarkb | looking at that I guess I should set this new user to 60001? | 15:16 |
clarkb | (and then maybe one day we do the same with zuul/nodepool/zk?) | 15:16 |
clarkb | fungi: maybe we should hold a node for my proposed change then install a package that adds a user and confirm the behafior? | 15:19 |
fungi | so maybe adduser is obeying the values in login.defs rather than adduser.conf? | 15:20 |
clarkb | ya or not. That is why I'm wondering if we should test it. Seems like having a good answer to this is important to making whatever approach we take erliable as we apply it to other setups | 15:20 |
clarkb | basically invest in getting it right the first time then we can reapply that over and over again | 15:20 |
fungi | looks like useradd goes by what's in login.defs, while adduser relies on adduser.conf | 15:24 |
mordred | I love adduser vs useradd | 15:26 |
clarkb | I guess now we need to figure out which ansible uses? | 15:27 |
clarkb | fungi: I'm going to review some zuul changes, but then I'll come back to this and probably set up a held node and we can experiment with it. I do think it is worthwhile to sort out properly before we do this to too many things | 15:52 |
fungi | yeah, i agree | 15:52 |
fungi | also we should probably make login.defs and adduser.conf consistent | 15:53 |
fungi | one option would be to drop LAST_UID and LAST_GID to 9999 (that ought to be plenty for normal users without static uids/gids anyway) | 15:54 |
clarkb | oh ya I like that. Then we're no longer in conflict with the 10001 stuff | 15:58 |
*** marios is now known as marios|out | 16:16 | |
clarkb | tristanC: where is the Dockerfile for matrix-gerritbot? | 16:22 |
clarkb | I didn't see it in the matrix-gerritbot software repo | 16:22 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot https://review.opendev.org/c/opendev/system-config/+/816769 | 16:30 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user https://review.opendev.org/c/opendev/system-config/+/816770 | 16:30 |
clarkb | fungi: ^ I put a forced failure in testinfra tests and I'll make a node hold. | 16:31 |
fungi | excellent | 16:31 |
clarkb | heh I was going to clean up my old holds but they aren't there due to the zuul restarts. I'll check nodepool directly after this | 16:31 |
clarkb | frickler: corvus: ianw: you've each got at least one "leaked" hold node. I'm not sure if they are still in use or not so I'll leave them as is but if you get a chance to look on the nodepool side can you do that and delete the node(s) if not longer needed? | 16:36 |
clarkb | or I can clean them up if you don't need them anymore (frickler has an oom debug node, corvus a registry debug node and ianw buildkit and gerrit 3.4 nodes. of these I suspect at least the gerrit 3.4 node is still used) | 16:37 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 1000 https://review.opendev.org/c/opendev/system-config/+/816764 | 16:38 |
clarkb | apparently you can override user in a straightforward manner but group info has to be in the groups file on the image? I'll get another patchset up for zk's related change but I think that means we should prefer setting only the user in docker-compose.yaml | 16:39 |
clarkb | actually no zookeeper itself sets it uid:group too so I'll leave that one alone. | 16:40 |
clarkb | but the more I read docs the less sure I am this is correct :/ | 16:41 |
tristanC | clarkb: there is no Dockerfile, the image is built with nix | 16:45 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 1000 https://review.opendev.org/c/opendev/system-config/+/816764 | 16:45 |
frickler | clarkb: the oom debug node can be removed (that helped in identifying the bullseye qemu issue), but I have two questions: a) how did you link it to "oom debug" when that info is no longer in zuul? b) how to properly clean it up, just a "nodepool delete"? | 16:46 |
clarkb | frickler: if you do a nodepool list --detail | grep hold you get all of the held nodes including their detailed message from the zuul hold side. I used that to identify what it was held for. Then ya you nodepool delete $nodeid on a nodepool node | 16:47 |
fungi | frickler: nodepool list --detail | 16:47 |
fungi | the comment from the autohold gets copied into the node info | 16:47 |
tristanC | clarkb: is there something missing from the image? | 16:48 |
clarkb | tristanC: I was mostly curious as I'm looking at changing how the bot runs a little bit and thought it might be nice to look at. | 16:48 |
tristanC | clarkb: the image is defined as https://github.com/softwarefactory-project/gerritbot-matrix/blob/master/flake.nix#L56-L64 , here is the documentation https://nixos.org/manual/nixpkgs/stable/#sec-pkgs-dockerTools | 16:51 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot https://review.opendev.org/c/opendev/system-config/+/816769 | 16:51 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user https://review.opendev.org/c/opendev/system-config/+/816770 | 16:51 |
clarkb | tristanC: ^ related to that effort | 16:51 |
frickler | ah, I missed the --detail thing, thx. node deleted | 16:52 |
clarkb | frickler: thanks! | 16:52 |
clarkb | tristanC: I guess the container is defined to run as root in there? Will have to see how testing goes for overriding that | 16:53 |
clarkb | It was complaining about a lack of content in the groups file. I thought that was maybe related to a yaml parsing issue, but now wonder if no groups file exists at all and that is part of the issue? | 16:54 |
corvus | clarkb: deleted | 16:56 |
clarkb | my rough plan for today is to get the haproxy statsd update changes in and if that looks happy maybe go for the zookeeper statsd changes too | 16:58 |
clarkb | While also figuring out user stuff in 816769 | 16:58 |
tristanC | clarkb: the user is not defined, but yes it currently expect uid 0 for the home directory needed by openssh | 17:01 |
clarkb | I've approved the bullseye updated for haproxy-statsd | 17:13 |
clarkb | fungi: if you can rereview https://review.opendev.org/c/opendev/system-config/+/816764 that would be great. I'm not sure if the : before was causing problems or not but this makes it more consistent with other setups. | 17:14 |
*** jpena is now known as jpena|off | 17:22 | |
clarkb | fungi: ok 158.69.73.60 is up and running. Do you think I should run a useradd and an adduser for both system and non system users and see what the results are? then maybe install a package that comes with a system user/group (libvirt?) | 17:32 |
clarkb | I'm going to start with the package install since that is easy to uninstall nd purge | 17:35 |
fungi | yeah, those sound like good enough tests | 17:36 |
clarkb | huh libvirt doesn't add a user or group I swear that it did /me checks devstack | 17:37 |
fungi | well, also packages which add users are going to do so in the system (100-999 by default) range | 17:37 |
fungi | clarkb: try installing snmpd? | 17:38 |
fungi | it creates a Debian-snmp user and group | 17:38 |
clarkb | I needed libvirt-daemon-system apparently | 17:40 |
clarkb | looks like libvirt-dnsmasq got uid 112 which is the next system uid in the low range | 17:40 |
clarkb | (thats a good indication we aren't breaking this too much) | 17:40 |
clarkb | I'll try snmpd and then adduser/useradd stuff | 17:40 |
clarkb | fungi: snmp is already there | 17:40 |
clarkb | from before we add our stuff on top | 17:40 |
fungi | oh, right, this is less a test node and more a deployed server in that regard | 17:42 |
fungi | clarkb: looking at my workstation, try installing usbmuxd | 17:44 |
fungi | should create a usbmux user | 17:44 |
fungi | or try tcpdump, or postgresql | 17:44 |
fungi | tcpdump might be the best test since it has few deps and creates both a user and a group, whereas usbmux only creates a user | 17:46 |
clarkb | fungi: https://paste.opendev.org/show/bRbYPCBvUhstKOGyCnNs/ | 17:52 |
clarkb | it seems that adduser/addgroup do what we want but useradd/groupadd do not | 17:53 |
clarkb | It seems that login.defs are ignored by adduser hence adding the 1000 uid | 17:53 |
clarkb | then useradd respects login.defs and adds the user as biggest uid +1 | 17:54 |
clarkb | To make these somewhat consistent with each other I guess we reduce our max values down to say 9999 in both login.defs and adduser.conf? | 17:54 |
clarkb | also anecdotally it seems that pacakge installs use adduser and not useradd, but they do system users and they all end up below in the range anyway | 17:55 |
clarkb | heh we already include tcpdump in our test images so it is early in the list | 17:55 |
clarkb | I'm going to manually edit login.defs uid max and see if it shifts down as we want | 17:56 |
fungi | yeah, i say we adjust the max values in both login.defs and adduser.conf, and also adjust the regular minimums in adduser.conf to be consistent with our current login.defs | 17:59 |
fungi | that gives us the option of either putting static users/groups for containerized services between 1001 and 1999, or above 9999 | 18:00 |
fungi | we seem to start our admin users at 2000 | 18:00 |
clarkb | ok confirmed lowering the maxes makes useradd and groupadd respect them even if thereare higher values existing | 18:00 |
clarkb | fungi: yup I'm thinking we do higher than 9999 for containerized services will work well since we already do that for a number of them like zuuk/nodepool/zk | 18:00 |
clarkb | gerrit is special for raisins not worth changing but if everything else can match up that way I think that would be good | 18:00 |
fungi | though we should probably avoid values over 60000 | 18:01 |
clarkb | fungi: ya I doubt we'd need to go that high | 18:01 |
clarkb | libvirt uses a high value like that fwiw | 18:01 |
clarkb | so it seems some system pacakges may also explicitly go high | 18:01 |
fungi | right | 18:01 |
clarkb | fungi: did you want to put that change together since you spec'd it out earlier (all I did was some basic testing to confirm behavior) | 18:02 |
fungi | yeah, i can, just a sec | 18:02 |
clarkb | And then ya we can do 2000-9999 for system level normal users (sorry if that statement didn't make sense). The distro continues to have 0-999 for system level system users then we can use >=10000 for our container users | 18:03 |
clarkb | ah actually the ranges are this: 0-999 for distro system users/groups, 1000-1999 unallocated, 2000-2999 infra-root users, 3000-9999 non system users created by config mgmt, >=10000 free for use in containers | 18:06 |
clarkb | spot checking things on nb0X we have letsencrypt as gid 10002 because nodepool group is 10001. This implies ansible is using useradd/groupadd and not addgroup/adduser | 18:09 |
clarkb | I think this is ok and the next redeployment of those services will end up correcting that sort of thing | 18:10 |
clarkb | any objection to me approving 816764 now? | 18:11 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Lower UID/GID range max to make way for containers https://review.opendev.org/c/opendev/system-config/+/816869 | 18:11 |
opendevreview | Merged opendev/system-config master: Update haproxy-statsd to bullseye and python3.9 https://review.opendev.org/c/opendev/system-config/+/816765 | 18:12 |
fungi | clarkb: no objection | 18:12 |
fungi | also 816869 is the account range adjustments as discussed | 18:12 |
clarkb | yup +2'd as noted from my spot checking letsencrypt on nodepool and zuul nodes will be weird, but that will sort of self correct over time | 18:13 |
clarkb | if we really wanted to we could probably chown all the 10002 group stuff over to say 3000 or whatever the actual next value is | 18:14 |
clarkb | I'm thinking things like 816869 and 816771 might be best on not a friday. I won't object if others want to push them in but I'd like to get a bike ride in today as there is no rain and generally not worry about fixing that up over the weekend if it has a sad :) | 18:16 |
clarkb | tristanC: so is there no way to run this as a different group? docker says docker: Error response from daemon: unable to find group 11000-i: no matching entries in group file. I am able to override uid:group for other containers that don't set the group either. This makes me wonder if it is something about the /etc/group file not existing at all? | 18:20 |
clarkb | oh wait I think I see it, that is a bug in my script edit | 18:22 |
clarkb | ugh | 18:22 |
clarkb | the -i is important :( | 18:22 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot https://review.opendev.org/c/opendev/system-config/+/816769 | 18:24 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user https://review.opendev.org/c/opendev/system-config/+/816770 | 18:24 |
clarkb | fungi: ^ thats rebased on top of your change with the test force failure removed | 18:24 |
tristanC | clarkb: i think you would need to keep the image default user, but you should be able to map it to an arbritary host uid. Otherwise I can bake the uid you need in the image | 18:24 |
tristanC | i mean if that is easier for you | 18:25 |
clarkb | tristanC: isn't it just running a process though? so as long as we mind mount the files with the correct perms we are good? | 18:25 |
clarkb | I think it would only be a problem if the executable isn't +x or readable for the different uid/gid or the bot tries to write to somewhere that needs different perms | 18:25 |
tristanC | clarkb: i'm not entirely sure how docker handle the --user arg, i guess what needs to be checked is that the ~/.ssh/id_rsa key is readable | 18:29 |
tristanC | clarkb: fwiw here are my notes about rootless podman with regards to sharing host uid with container: https://github.com/podenv/podenv/blob/main/docs/references/userns.md | 18:29 |
clarkb | tristanC: yup the rest of that change chmods the contents of the bind mount to match | 18:30 |
clarkb | tristanC: for mapping my concern with that is it seems docker uses a single mapping? which means that if you want to map uid 0 in one container to user foo and uid 0 in another container to user bar you can't? But maybe I'm missing soething important there | 18:31 |
clarkb | hrm https://review.opendev.org/c/opendev/system-config/+/816765 only ran the promote job for the image in deploy | 18:54 |
clarkb | I guess I'll need to manually pull and restart the container when the second change lands | 18:54 |
clarkb | I'll go ahead and do that now for the first change update | 18:54 |
clarkb | that also updated the haproxy image so it will restart as well. Should be quick. Any objection to me doing that now? | 18:56 |
fungi | no objection | 18:57 |
* clarkb goes for it | 18:57 | |
clarkb | thats done I can reach opendev.org | 18:57 |
clarkb | https://grafana.opendev.org/d/ZQmopePMz/opendev-load-balancer?orgId=1&from=now-5m&to=now shows new data appears to be arriving | 18:58 |
clarkb | I'll repeat this once the second change lands and taht one likely won't want to restart haproxy, just the statsd process | 18:58 |
fungi | rackspace opened a ticket saying there's an outage or impending outage for the cacti trove instance, i haven't had a chance to look into it yet | 18:59 |
clarkb | I'm going to eat lunch and when that is done plan to do the second haproxy-statsd restart | 19:05 |
opendevreview | Merged opendev/system-config master: Run haproxy-statsd as uid 1000 https://review.opendev.org/c/opendev/system-config/+/816764 | 19:28 |
clarkb | deploy jobs udpated ^ automatically and https://grafana.opendev.org/d/ZQmopePMz/opendev-load-balancer?orgId=1&from=now-5m&to=now still shows new data arriving and the uids lgtm | 19:39 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/816762/ is probably reasonably safe to land now as a result? | 19:40 |
clarkb | fungi: ^ what do you think? I am probably going to be in and out today as I try to enjoy the lack of rain | 19:40 |
fungi | yep, i've approved it now | 19:43 |
fungi | thanks! | 19:43 |
fungi | we're gearing up for a prolonged wind event here, so need to switch gears shortly to rearrange things on the deck before i'm stuck doing it in the middle of a tempest | 19:44 |
mordred | fungi: you should rig up some lines and pulleys so that you can treat the house like a sailboat and set the deck for the prevailing wind direction | 20:06 |
clarkb | Fungi's Moving Castle | 20:07 |
fungi | yeah, for now i'm just battening down the hatches | 20:19 |
fungi | also an aggressive last-minute vegetable harvest, because whatever we don't take off the plants, the wind will | 20:21 |
clarkb | corvus: service-zuul is failing because of zuul01 not having LE configuration | 20:30 |
corvus | clarkb: ah, thx. i'll see if i can fix that | 20:34 |
clarkb | the apache startup fails as a result is the actual error if I'm reading logs correctly | 20:35 |
corvus | that makes sense; maybe we should just go ahead and add the LE stuff to zuul01 even tho we're not using it | 20:42 |
corvus | either that, or remove the apache stuff | 20:42 |
clarkb | ya I think adding LE to it would be fine. Theoretically we'll want that in the near future anyway? | 20:43 |
corvus | unless we want to separate web from schedulers | 20:44 |
clarkb | ah | 20:44 |
corvus | i haven't thought that much ahead, and it may not be worth thinking ahead until we see resource usage of both of them | 20:44 |
corvus | under the new regime | 20:44 |
corvus | (does a zuul-web take up a lot of ram because it's a sort-of-scheduler?) | 20:45 |
opendevreview | Merged opendev/system-config master: Run zookeeper-statsd as the zookeeper user https://review.opendev.org/c/opendev/system-config/+/816762 | 20:46 |
opendevreview | Merged opendev/system-config master: Update zookeeper-statsd to python3.9 on bullseye https://review.opendev.org/c/opendev/system-config/+/816763 | 20:46 |
clarkb | I'm not sure if ^ will have the same issue as the haproxy statsd update where the bullseye update doesn't actually trigger deploy jobs. I'll manually pull and up -d if necessary | 20:46 |
opendevreview | James E. Blair proposed opendev/system-config master: Add LE config for zuul01 https://review.opendev.org/c/opendev/system-config/+/816903 | 20:48 |
clarkb | hrm the promote and the service-zookeeper jobs seem to run concurrently as well | 20:51 |
clarkb | the promote is probably quick enough that this isn't a real problem but I guess something else to look at | 20:52 |
clarkb | the zookeeper image has updated as well so its going through that similar to updating the haproxy. The playbook does one zk at a time to avoid outages | 20:54 |
clarkb | oh wait no the zookeeper images were already up to date | 20:56 |
clarkb | only the stats restarted. The order of the docker ps output changed | 20:56 |
clarkb | looking at grafana we are still getting zk data so I think the first update is happy. | 20:57 |
clarkb | I'm likely to be doing a school run during the second change's pass due to the hourly jobs queueing up here in a minute or two | 20:59 |
clarkb | but I don't expect any problems now that the first one is in and happy | 20:59 |
clarkb | corvus: check the note I left on 816903 | 21:01 |
clarkb | corvus: I think you may want to add in the borg backup excludes too | 21:01 |
clarkb | maybe those should go in a group var though | 21:01 |
clarkb | infra-root I've approved the mailman3 spec as minor feedback updates were made only since we discussed at our last meeting | 21:03 |
clarkb | confirmed that the deploy job is the only one queued for the bullseye update on that container image. I'll manually pull and up -d when i get back from the school run | 21:05 |
corvus | oh i missed i git add sorry | 21:07 |
opendevreview | James E. Blair proposed opendev/system-config master: Add LE config for zuul01 https://review.opendev.org/c/opendev/system-config/+/816903 | 21:07 |
corvus | that's what i thought i pushed :) | 21:08 |
opendevreview | Merged opendev/infra-specs master: Add a specification for Mailman 3 https://review.opendev.org/c/opendev/infra-specs/+/810990 | 21:08 |
*** jonher_ is now known as jonher | 21:28 | |
*** yoctozepto8 is now known as yoctozepto | 21:28 | |
ianw | clarkb: if you still have that window open, you can remove the buildkit nodes, the gerrit 3.4 one i still have up for poking at | 21:56 |
ianw | (otherwise i'll do it later) | 21:56 |
fungi | ianw: i'll clean them up | 21:58 |
fungi | and thanks! | 21:58 |
fungi | hope your saturday isn't intolerable | 21:58 |
ianw | so far so good :) | 21:59 |
clarkb | fungi: thanks. | 21:59 |
clarkb | I'm about to do the statsd update on the zks | 21:59 |
fungi | i'm trying to figure out why setuptools seems to have suddenly become uninstallable in tripleo and devstack jobs as of about an hour ago | 22:00 |
ianw | looks like we've got a 9-stream image ready in rax-ord, so that's good | 22:00 |
ianw | friday night, let's release! | 22:01 |
fungi | ianw: the 9-stream work broke centos-7 image builds, there's a simple fix up | 22:01 |
* fungi checks | 22:01 | |
clarkb | fungi: if you have a link to one of those failures I can take a look as well once zk stats are done | 22:01 |
fungi | https://review.opendev.org/816813 | 22:01 |
fungi | it's project-config, so shouldn't block dib releases | 22:02 |
fungi | clarkb: there are several linked in #openstack-infra in the last few minutes | 22:02 |
ianw | yeah, that looks fine. we probably need to think about that matching for 10-stream, but one thing at a time | 22:02 |
fungi | i was worried it could be related to the pbr bump, but seems to have started up hours after | 22:02 |
clarkb | ah cool I haven't cauhgt up on all irc channels yet | 22:03 |
clarkb | fungi: ya the pbr bump was from wednesday | 22:03 |
clarkb | or early yseterday. It doesn't pyproject.toml properly from what I've seen but existing users should be fine | 22:03 |
fungi | the reqs update for the pbr bump merged around 1700 today | 22:03 |
fungi | but still far earlier than these errors began, it seems | 22:03 |
clarkb | pbr is setup_requires so requirements shouldn't really affect it much | 22:04 |
fungi | right | 22:04 |
fungi | ianw: the nodes held for buildkit debugging are deleted now | 22:05 |
clarkb | ok I'm happy with zk stats. Updated the container on all three nodes and still getting data on grafana dashboards | 22:05 |
fungi | awesome | 22:05 |
fungi | oh, that reminds me, i was going to check on that rax ticket about the cacti db | 22:05 |
fungi | i'll try to take a look at that shortly | 22:06 |
opendevreview | Merged openstack/project-config master: Fix haveged installation in CentOS7 https://review.opendev.org/c/openstack/project-config/+/816813 | 22:12 |
fungi | okay, so there were two tickets, one for the db behind wiki.openstack.org and one for the db behind cacti.openstack.org | 22:58 |
fungi | both services seem to be working fine now though | 23:01 |
corvus | my current plan is to restart zuul on master tomorrow morning and try a 2nd scheduler again. i'll let it run as long as it runs without errors. if there's an error, i'l triage it and may try to roll forward if it seems tractable. | 23:17 |
corvus | (otherwise clear state and roll back to .4) | 23:17 |
fungi | i expect to be around, so happy to help or just keep an eye on it | 23:17 |
corvus | i think it'll be interesting either way :) | 23:18 |
fungi | yep! | 23:18 |
fungi | seems like this is getting really close | 23:18 |
corvus | yep i think so | 23:19 |
Clark[m] | Sounds good. Not sure how much I'll be around though | 23:30 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!