Friday, 2021-11-05

opendevreview	Clark Boylan proposed opendev/system-config master: Cleanup users launch-node.py might have used https://review.opendev.org/c/opendev/system-config/+/816771	00:02
opendevreview	Clark Boylan proposed opendev/system-config master: Remove python3-nova-agent package from our servers https://review.opendev.org/c/opendev/system-config/+/816772	00:02
opendevreview	Merged opendev/system-config master: Don't set lodgeit db dir perms https://review.opendev.org/c/opendev/system-config/+/816754	00:04
ianw	excellent, the 9-stream build fails with some sort of yaml library exception : TypeError: load() missing 1 required positional argument: 'Loader'	00:19
ianw	this must be pyyaml 6	00:21
opendevreview	Ian Wienand proposed openstack/project-config master: nodepool elements: use yaml.safe_load https://review.opendev.org/c/openstack/project-config/+/816774	00:26
ianw	^ this explains why it passed dib/nodepool gate	00:26
Clark[m]	Ya you need to explicitly opt into unsafe now	00:26
*** odyssey4me is now known as Guest4951		00:56
opendevreview	Merged openstack/project-config master: nodepool elements: use yaml.safe_load https://review.opendev.org/c/openstack/project-config/+/816774	01:08
ianw	sigh, now another problem	02:13
ianw	2021-11-05 01:51:07.189 \| Updating cache of https://opendev.org/openstack/openstack.git in /opt/dib_cache/source-repositories/openstack_179b61797588a5983c2f97c6533dca570c8f887d with ref *	02:13
ianw	2021-11-05 01:51:13.993 \| Could not access submodule 'adjutant'	02:13
ianw	2021-11-05 01:51:13.993 \| Could not access submodule 'ansible-hardening'	02:13
ianw	2021-11-05 01:51:13.993 \| Could not access submodule 'ansible-role-collect-logs'	02:13
ianw	... and so on ...	02:13
ianw	... is dib broken, something about a new git, gitea, or this repo ...	02:13
*** sshnaidm is now known as sshnaidm\|off		03:06
Clark[m]	Is new git trying to auto update submodules?	03:19
ianw	https://nb01.opendev.org/ubuntu-bionic-0000221037.log failed with this error	03:22
ianw	https://nb01.opendev.org/ubuntu-focal-0000119628.log was the next build and passed	03:23
Clark[m]	The urls are relative so those updates should work however I wouldn't expect the caching step to actually need to fetch the submodule content	03:25
Clark[m]	Re new git I wonder if that is a bullseye hit behavior update where it actually fetches those?	03:26
ianw	https://nb02.opendev.org/debian-stretch-0000051795.log also failed	03:26
Clark[m]	Since source-repositories runs outside the chroot it would be the bullseye hit?	03:26
Clark[m]	Honestly that repo isn't used for anything that I know of and is incomplt these days. We might get away with removing it from caching temporarily if necessary	03:27
ianw	https://nb02.opendev.org/debian-buster-0000061362.log was the build that followed that, and that passed	03:27
Clark[m]	*it is incomplete	03:28
ianw	it seems like possibly running the new git failed once -- may have done something?! -- and further runs are working	03:28
Clark[m]	ianw: maybe we need to see if it affects those older systems specifically? Then my hunch about the chroot is likely wrong	03:28
Clark[m]	Oh I see. Ya maybe something about git repo state on the first pass then git doesn't try again?	03:29
ianw	this would be running with the build-system git, not the guest git	03:29
ianw	i can't seem to replicate it -- but also the git command-lines run by the caching are ... let's say not a priori obvious	03:30
ianw	in short, it happened, twice, on different servers, i don't know why and it doesn't seem to be happening now	03:31
opendevreview	Ian Wienand proposed opendev/system-config master: gerrit: mark file reviewed during testing https://review.opendev.org/c/opendev/system-config/+/816766	03:44
*** frenzyfriday\|sick is now known as frenzy_friday		04:25
ianw	it just happened again with https://nb01.opendev.org/centos-7-0000237899.log	04:55
ianw	the list of submodules is different now	04:55
ianw	2021-11-05 04:50:44.107 \| Updating cache of https://opendev.org/openstack/openstack.git in /opt/dib_cache/source-repositories/openstack_179b61797588a5983c2f97c6533dca570c8f887d with ref *	04:55
ianw	2021-11-05 04:50:48.309 \| Could not access submodule 'freezer-tempest-plugin'	04:55
ianw	2021-11-05 04:50:48.309 \| Could not access submodule 'python-ironicclient'	04:55
ianw	i'm just about out of time for today, i'm not going to get to dig on this much further	04:56
opendevreview	Ian Wienand proposed opendev/system-config master: gerrit: mark file reviewed during testing https://review.opendev.org/c/opendev/system-config/+/816766	05:08
opendevreview	Ian Wienand proposed openstack/project-config master: infra-package-needs: skip haveged start on 9-stream https://review.opendev.org/c/openstack/project-config/+/816782	06:41
opendevreview	Merged openstack/project-config master: infra-package-needs: skip haveged start on 9-stream https://review.opendev.org/c/openstack/project-config/+/816782	07:00
opendevreview	Merged opendev/system-config master: gerrit: don't chown mariadb container directory https://review.opendev.org/c/opendev/system-config/+/816750	09:25
opendevreview	Alfredo Moralejo proposed openstack/project-config master: Fix haveged installation in CentOS7 https://review.opendev.org/c/openstack/project-config/+/816813	10:07
*** jpena\|off is now known as jpena		10:36
*** dviroel\|out is now known as dviroel\|rover		10:38
*** jssfr is now known as foorl		11:33
*** mazzy50981 is now known as mazzy5098		12:00
fungi	Clark[m]: ianw: i agree, the openstack/openstack repo is unnecessary to cache, i'd be in favor of filtering it out of the repos list explicitly	13:45
opendevreview	Andre Aranha proposed zuul/zuul-jobs master: Add fips version of jobs needed for OpenStack https://review.opendev.org/c/zuul/zuul-jobs/+/816385	14:18
clarkb	fungi: re the user changes in https://review.opendev.org/c/opendev/system-config/+/816769/1/playbooks/roles/gerritbot/tasks/main.yaml we are defaulting to uid 11000 with the idea that other bot users could be 11001 etc or 12000 and so on. What I'm not sure about and questioning now is if we install a distro package that creates a new system user will it create that as uid 11001 after	14:23
clarkb	we create that user?	14:23
fungi	it will if 11000 is in the range adduser is willing to use	14:24
fungi	on review.o.o, adduser.conf lists LAST_UID=59999 and LAST_GID=59999 so, yes it will assume no new users should be created lower than 11000 if there's a 11000 in passwd/groups	14:25
clarkb	fungi: a better example might be zk04.opendev.org. Since that has zk running as 10001 (I used it as an example here)	14:26
fungi	at one point we were setting FIRST_UID and FIRST_GID higher than 1000 i thought, but doesn't look like it now	14:26
clarkb	so Igess the next question I have is this a problem and do we need to fix the existing zuul/nodepool/zk uid assignments	14:27
clarkb	fungi: I suppose as an alternative I can create it as a non system user?	14:28
fungi	it can become a problem if we rsync files or detach/attach a cinder volume during server replacements	14:28
clarkb	then system packages will continue to use their normal range and we already manage our actual users with specific uids so that won't conflict?	14:28
fungi	not sure what you mean by non system user	14:29
clarkb	fungi: a regular user eg system: false/no in that ansible task	14:30
fungi	if you're talking about adduser --system that normally causes it to pick a uid/gid from the "system" range rather than the normal user range	14:30
clarkb	fungi: yes I know. We have created the zuul/nodepool/zookeeper users with a high (10001) uid and set it as a system user in ansible	14:31
clarkb	I'm wondering if it would be better to not create these users as system users so that system packages can continue to pick from the normal range	14:31
clarkb	normal range for system users I mean	14:31
clarkb	then our non system users can have hardcoded uids and since we manage those directly we can keep them from getting too out of sync?	14:32
fungi	i still don't understand. if we explicitly picked a uid/gid outside the "system" range then it's not really a system user anyway and "system" user uids/gids picked by the package maintscripts will be unaffected anyway	14:33
clarkb	fungi: ok I don't know what the system: yes in the ansible that I cargo culted from the zookeeper ansible will dothen	14:35
clarkb	fungi: is there no other flag for system vs not?	14:35
fungi	not sure what you mean by flag	14:35
clarkb	https://review.opendev.org/c/opendev/system-config/+/816769/1/playbooks/roles/gerritbot/tasks/main.yaml line 14	14:36
fungi	the adduser manpage explains what adduser --system does, there are behaviors it switches besides just which uid/gid range is used	14:36
clarkb	maybe that is a noop if we set our own uids and gids	14:36
clarkb	fungi: basically what I'm saying is we already do this for zuul/nodepool/zk. If this is a problem we not only need to rethink my gerritbot change but those systems as well potentially	14:38
clarkb	but I'm not yet sure if there is a problem with what I have proposed	14:39
clarkb	it just occurred to me overnight that there may be and it was worth considering	14:39
fungi	we went through this some years ago when we were still using puppet, and concluded that the sanest option was to create a gap between LAST_SYSTEM_UID and FIRST_UID where we could create our static uids	14:40
fungi	another alternative is to pick uids/gids strictly greater than LAST_UID and LAST_GID (59999 looks like)	14:41
opendevreview	Alfredo Moralejo proposed opendev/system-config master: Add CentOS Stream 9 to AFS mirrors sync https://review.opendev.org/c/opendev/system-config/+/816852	14:42
clarkb	fungi: we do set UID_MIN etc in opendev/system-config/playbooks/roles/base/users/files/Debian/login.defs	15:15
clarkb	I think that was the conclusion of the puppet stuff and it got ported to ansible	15:15
fungi	oh, it's just not in adduser.conf	15:16
clarkb	looking at that I guess I should set this new user to 60001?	15:16
clarkb	(and then maybe one day we do the same with zuul/nodepool/zk?)	15:16
clarkb	fungi: maybe we should hold a node for my proposed change then install a package that adds a user and confirm the behafior?	15:19
fungi	so maybe adduser is obeying the values in login.defs rather than adduser.conf?	15:20
clarkb	ya or not. That is why I'm wondering if we should test it. Seems like having a good answer to this is important to making whatever approach we take erliable as we apply it to other setups	15:20
clarkb	basically invest in getting it right the first time then we can reapply that over and over again	15:20
fungi	looks like useradd goes by what's in login.defs, while adduser relies on adduser.conf	15:24
mordred	I love adduser vs useradd	15:26
clarkb	I guess now we need to figure out which ansible uses?	15:27
clarkb	fungi: I'm going to review some zuul changes, but then I'll come back to this and probably set up a held node and we can experiment with it. I do think it is worthwhile to sort out properly before we do this to too many things	15:52
fungi	yeah, i agree	15:52
fungi	also we should probably make login.defs and adduser.conf consistent	15:53
fungi	one option would be to drop LAST_UID and LAST_GID to 9999 (that ought to be plenty for normal users without static uids/gids anyway)	15:54
clarkb	oh ya I like that. Then we're no longer in conflict with the 10001 stuff	15:58
*** marios is now known as marios\|out		16:16
clarkb	tristanC: where is the Dockerfile for matrix-gerritbot?	16:22
clarkb	I didn't see it in the matrix-gerritbot software repo	16:22
opendevreview	Clark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot https://review.opendev.org/c/opendev/system-config/+/816769	16:30
opendevreview	Clark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user https://review.opendev.org/c/opendev/system-config/+/816770	16:30
clarkb	fungi: ^ I put a forced failure in testinfra tests and I'll make a node hold.	16:31
fungi	excellent	16:31
clarkb	heh I was going to clean up my old holds but they aren't there due to the zuul restarts. I'll check nodepool directly after this	16:31
clarkb	frickler: corvus: ianw: you've each got at least one "leaked" hold node. I'm not sure if they are still in use or not so I'll leave them as is but if you get a chance to look on the nodepool side can you do that and delete the node(s) if not longer needed?	16:36
clarkb	or I can clean them up if you don't need them anymore (frickler has an oom debug node, corvus a registry debug node and ianw buildkit and gerrit 3.4 nodes. of these I suspect at least the gerrit 3.4 node is still used)	16:37
opendevreview	Clark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 1000 https://review.opendev.org/c/opendev/system-config/+/816764	16:38
clarkb	apparently you can override user in a straightforward manner but group info has to be in the groups file on the image? I'll get another patchset up for zk's related change but I think that means we should prefer setting only the user in docker-compose.yaml	16:39
clarkb	actually no zookeeper itself sets it uid:group too so I'll leave that one alone.	16:40
clarkb	but the more I read docs the less sure I am this is correct :/	16:41
tristanC	clarkb: there is no Dockerfile, the image is built with nix	16:45
opendevreview	Clark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 1000 https://review.opendev.org/c/opendev/system-config/+/816764	16:45
frickler	clarkb: the oom debug node can be removed (that helped in identifying the bullseye qemu issue), but I have two questions: a) how did you link it to "oom debug" when that info is no longer in zuul? b) how to properly clean it up, just a "nodepool delete"?	16:46
clarkb	frickler: if you do a nodepool list --detail \| grep hold you get all of the held nodes including their detailed message from the zuul hold side. I used that to identify what it was held for. Then ya you nodepool delete $nodeid on a nodepool node	16:47
fungi	frickler: nodepool list --detail	16:47
fungi	the comment from the autohold gets copied into the node info	16:47
tristanC	clarkb: is there something missing from the image?	16:48
clarkb	tristanC: I was mostly curious as I'm looking at changing how the bot runs a little bit and thought it might be nice to look at.	16:48
tristanC	clarkb: the image is defined as https://github.com/softwarefactory-project/gerritbot-matrix/blob/master/flake.nix#L56-L64 , here is the documentation https://nixos.org/manual/nixpkgs/stable/#sec-pkgs-dockerTools	16:51
opendevreview	Clark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot https://review.opendev.org/c/opendev/system-config/+/816769	16:51
opendevreview	Clark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user https://review.opendev.org/c/opendev/system-config/+/816770	16:51
clarkb	tristanC: ^ related to that effort	16:51
frickler	ah, I missed the --detail thing, thx. node deleted	16:52
clarkb	frickler: thanks!	16:52
clarkb	tristanC: I guess the container is defined to run as root in there? Will have to see how testing goes for overriding that	16:53
clarkb	It was complaining about a lack of content in the groups file. I thought that was maybe related to a yaml parsing issue, but now wonder if no groups file exists at all and that is part of the issue?	16:54
corvus	clarkb: deleted	16:56
clarkb	my rough plan for today is to get the haproxy statsd update changes in and if that looks happy maybe go for the zookeeper statsd changes too	16:58
clarkb	While also figuring out user stuff in 816769	16:58
tristanC	clarkb: the user is not defined, but yes it currently expect uid 0 for the home directory needed by openssh	17:01
clarkb	I've approved the bullseye updated for haproxy-statsd	17:13
clarkb	fungi: if you can rereview https://review.opendev.org/c/opendev/system-config/+/816764 that would be great. I'm not sure if the : before was causing problems or not but this makes it more consistent with other setups.	17:14
*** jpena is now known as jpena\|off		17:22
clarkb	fungi: ok 158.69.73.60 is up and running. Do you think I should run a useradd and an adduser for both system and non system users and see what the results are? then maybe install a package that comes with a system user/group (libvirt?)	17:32
clarkb	I'm going to start with the package install since that is easy to uninstall nd purge	17:35
fungi	yeah, those sound like good enough tests	17:36
clarkb	huh libvirt doesn't add a user or group I swear that it did /me checks devstack	17:37
fungi	well, also packages which add users are going to do so in the system (100-999 by default) range	17:37
fungi	clarkb: try installing snmpd?	17:38
fungi	it creates a Debian-snmp user and group	17:38
clarkb	I needed libvirt-daemon-system apparently	17:40
clarkb	looks like libvirt-dnsmasq got uid 112 which is the next system uid in the low range	17:40
clarkb	(thats a good indication we aren't breaking this too much)	17:40
clarkb	I'll try snmpd and then adduser/useradd stuff	17:40
clarkb	fungi: snmp is already there	17:40
clarkb	from before we add our stuff on top	17:40
fungi	oh, right, this is less a test node and more a deployed server in that regard	17:42
fungi	clarkb: looking at my workstation, try installing usbmuxd	17:44
fungi	should create a usbmux user	17:44
fungi	or try tcpdump, or postgresql	17:44
fungi	tcpdump might be the best test since it has few deps and creates both a user and a group, whereas usbmux only creates a user	17:46
clarkb	fungi: https://paste.opendev.org/show/bRbYPCBvUhstKOGyCnNs/	17:52
clarkb	it seems that adduser/addgroup do what we want but useradd/groupadd do not	17:53
clarkb	It seems that login.defs are ignored by adduser hence adding the 1000 uid	17:53
clarkb	then useradd respects login.defs and adds the user as biggest uid +1	17:54
clarkb	To make these somewhat consistent with each other I guess we reduce our max values down to say 9999 in both login.defs and adduser.conf?	17:54
clarkb	also anecdotally it seems that pacakge installs use adduser and not useradd, but they do system users and they all end up below in the range anyway	17:55
clarkb	heh we already include tcpdump in our test images so it is early in the list	17:55
clarkb	I'm going to manually edit login.defs uid max and see if it shifts down as we want	17:56
fungi	yeah, i say we adjust the max values in both login.defs and adduser.conf, and also adjust the regular minimums in adduser.conf to be consistent with our current login.defs	17:59
fungi	that gives us the option of either putting static users/groups for containerized services between 1001 and 1999, or above 9999	18:00
fungi	we seem to start our admin users at 2000	18:00
clarkb	ok confirmed lowering the maxes makes useradd and groupadd respect them even if thereare higher values existing	18:00
clarkb	fungi: yup I'm thinking we do higher than 9999 for containerized services will work well since we already do that for a number of them like zuuk/nodepool/zk	18:00
clarkb	gerrit is special for raisins not worth changing but if everything else can match up that way I think that would be good	18:00
fungi	though we should probably avoid values over 60000	18:01
clarkb	fungi: ya I doubt we'd need to go that high	18:01
clarkb	libvirt uses a high value like that fwiw	18:01
clarkb	so it seems some system pacakges may also explicitly go high	18:01
fungi	right	18:01
clarkb	fungi: did you want to put that change together since you spec'd it out earlier (all I did was some basic testing to confirm behavior)	18:02
fungi	yeah, i can, just a sec	18:02
clarkb	And then ya we can do 2000-9999 for system level normal users (sorry if that statement didn't make sense). The distro continues to have 0-999 for system level system users then we can use >=10000 for our container users	18:03
clarkb	ah actually the ranges are this: 0-999 for distro system users/groups, 1000-1999 unallocated, 2000-2999 infra-root users, 3000-9999 non system users created by config mgmt, >=10000 free for use in containers	18:06
clarkb	spot checking things on nb0X we have letsencrypt as gid 10002 because nodepool group is 10001. This implies ansible is using useradd/groupadd and not addgroup/adduser	18:09
clarkb	I think this is ok and the next redeployment of those services will end up correcting that sort of thing	18:10
clarkb	any objection to me approving 816764 now?	18:11
opendevreview	Jeremy Stanley proposed opendev/system-config master: Lower UID/GID range max to make way for containers https://review.opendev.org/c/opendev/system-config/+/816869	18:11
opendevreview	Merged opendev/system-config master: Update haproxy-statsd to bullseye and python3.9 https://review.opendev.org/c/opendev/system-config/+/816765	18:12
fungi	clarkb: no objection	18:12
fungi	also 816869 is the account range adjustments as discussed	18:12
clarkb	yup +2'd as noted from my spot checking letsencrypt on nodepool and zuul nodes will be weird, but that will sort of self correct over time	18:13
clarkb	if we really wanted to we could probably chown all the 10002 group stuff over to say 3000 or whatever the actual next value is	18:14
clarkb	I'm thinking things like 816869 and 816771 might be best on not a friday. I won't object if others want to push them in but I'd like to get a bike ride in today as there is no rain and generally not worry about fixing that up over the weekend if it has a sad :)	18:16
clarkb	tristanC: so is there no way to run this as a different group? docker says docker: Error response from daemon: unable to find group 11000-i: no matching entries in group file. I am able to override uid:group for other containers that don't set the group either. This makes me wonder if it is something about the /etc/group file not existing at all?	18:20
clarkb	oh wait I think I see it, that is a bug in my script edit	18:22
clarkb	ugh	18:22
clarkb	the -i is important :(	18:22
opendevreview	Clark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot https://review.opendev.org/c/opendev/system-config/+/816769	18:24
opendevreview	Clark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user https://review.opendev.org/c/opendev/system-config/+/816770	18:24
clarkb	fungi: ^ thats rebased on top of your change with the test force failure removed	18:24
tristanC	clarkb: i think you would need to keep the image default user, but you should be able to map it to an arbritary host uid. Otherwise I can bake the uid you need in the image	18:24
tristanC	i mean if that is easier for you	18:25
clarkb	tristanC: isn't it just running a process though? so as long as we mind mount the files with the correct perms we are good?	18:25
clarkb	I think it would only be a problem if the executable isn't +x or readable for the different uid/gid or the bot tries to write to somewhere that needs different perms	18:25
tristanC	clarkb: i'm not entirely sure how docker handle the --user arg, i guess what needs to be checked is that the ~/.ssh/id_rsa key is readable	18:29
tristanC	clarkb: fwiw here are my notes about rootless podman with regards to sharing host uid with container: https://github.com/podenv/podenv/blob/main/docs/references/userns.md	18:29
clarkb	tristanC: yup the rest of that change chmods the contents of the bind mount to match	18:30
clarkb	tristanC: for mapping my concern with that is it seems docker uses a single mapping? which means that if you want to map uid 0 in one container to user foo and uid 0 in another container to user bar you can't? But maybe I'm missing soething important there	18:31
clarkb	hrm https://review.opendev.org/c/opendev/system-config/+/816765 only ran the promote job for the image in deploy	18:54
clarkb	I guess I'll need to manually pull and restart the container when the second change lands	18:54
clarkb	I'll go ahead and do that now for the first change update	18:54
clarkb	that also updated the haproxy image so it will restart as well. Should be quick. Any objection to me doing that now?	18:56
fungi	no objection	18:57
* clarkb goes for it		18:57
clarkb	thats done I can reach opendev.org	18:57
clarkb	https://grafana.opendev.org/d/ZQmopePMz/opendev-load-balancer?orgId=1&from=now-5m&to=now shows new data appears to be arriving	18:58
clarkb	I'll repeat this once the second change lands and taht one likely won't want to restart haproxy, just the statsd process	18:58
fungi	rackspace opened a ticket saying there's an outage or impending outage for the cacti trove instance, i haven't had a chance to look into it yet	18:59
clarkb	I'm going to eat lunch and when that is done plan to do the second haproxy-statsd restart	19:05
opendevreview	Merged opendev/system-config master: Run haproxy-statsd as uid 1000 https://review.opendev.org/c/opendev/system-config/+/816764	19:28
clarkb	deploy jobs udpated ^ automatically and https://grafana.opendev.org/d/ZQmopePMz/opendev-load-balancer?orgId=1&from=now-5m&to=now still shows new data arriving and the uids lgtm	19:39
clarkb	https://review.opendev.org/c/opendev/system-config/+/816762/ is probably reasonably safe to land now as a result?	19:40
clarkb	fungi: ^ what do you think? I am probably going to be in and out today as I try to enjoy the lack of rain	19:40
fungi	yep, i've approved it now	19:43
fungi	thanks!	19:43
fungi	we're gearing up for a prolonged wind event here, so need to switch gears shortly to rearrange things on the deck before i'm stuck doing it in the middle of a tempest	19:44
mordred	fungi: you should rig up some lines and pulleys so that you can treat the house like a sailboat and set the deck for the prevailing wind direction	20:06
clarkb	Fungi's Moving Castle	20:07
fungi	yeah, for now i'm just battening down the hatches	20:19
fungi	also an aggressive last-minute vegetable harvest, because whatever we don't take off the plants, the wind will	20:21
clarkb	corvus: service-zuul is failing because of zuul01 not having LE configuration	20:30
corvus	clarkb: ah, thx. i'll see if i can fix that	20:34
clarkb	the apache startup fails as a result is the actual error if I'm reading logs correctly	20:35
corvus	that makes sense; maybe we should just go ahead and add the LE stuff to zuul01 even tho we're not using it	20:42
corvus	either that, or remove the apache stuff	20:42
clarkb	ya I think adding LE to it would be fine. Theoretically we'll want that in the near future anyway?	20:43
corvus	unless we want to separate web from schedulers	20:44
clarkb	ah	20:44
corvus	i haven't thought that much ahead, and it may not be worth thinking ahead until we see resource usage of both of them	20:44
corvus	under the new regime	20:44
corvus	(does a zuul-web take up a lot of ram because it's a sort-of-scheduler?)	20:45
opendevreview	Merged opendev/system-config master: Run zookeeper-statsd as the zookeeper user https://review.opendev.org/c/opendev/system-config/+/816762	20:46
opendevreview	Merged opendev/system-config master: Update zookeeper-statsd to python3.9 on bullseye https://review.opendev.org/c/opendev/system-config/+/816763	20:46
clarkb	I'm not sure if ^ will have the same issue as the haproxy statsd update where the bullseye update doesn't actually trigger deploy jobs. I'll manually pull and up -d if necessary	20:46
opendevreview	James E. Blair proposed opendev/system-config master: Add LE config for zuul01 https://review.opendev.org/c/opendev/system-config/+/816903	20:48
clarkb	hrm the promote and the service-zookeeper jobs seem to run concurrently as well	20:51
clarkb	the promote is probably quick enough that this isn't a real problem but I guess something else to look at	20:52
clarkb	the zookeeper image has updated as well so its going through that similar to updating the haproxy. The playbook does one zk at a time to avoid outages	20:54
clarkb	oh wait no the zookeeper images were already up to date	20:56
clarkb	only the stats restarted. The order of the docker ps output changed	20:56
clarkb	looking at grafana we are still getting zk data so I think the first update is happy.	20:57
clarkb	I'm likely to be doing a school run during the second change's pass due to the hourly jobs queueing up here in a minute or two	20:59
clarkb	but I don't expect any problems now that the first one is in and happy	20:59
clarkb	corvus: check the note I left on 816903	21:01
clarkb	corvus: I think you may want to add in the borg backup excludes too	21:01
clarkb	maybe those should go in a group var though	21:01
clarkb	infra-root I've approved the mailman3 spec as minor feedback updates were made only since we discussed at our last meeting	21:03
clarkb	confirmed that the deploy job is the only one queued for the bullseye update on that container image. I'll manually pull and up -d when i get back from the school run	21:05
corvus	oh i missed i git add sorry	21:07
opendevreview	James E. Blair proposed opendev/system-config master: Add LE config for zuul01 https://review.opendev.org/c/opendev/system-config/+/816903	21:07
corvus	that's what i thought i pushed :)	21:08
opendevreview	Merged opendev/infra-specs master: Add a specification for Mailman 3 https://review.opendev.org/c/opendev/infra-specs/+/810990	21:08
*** jonher_ is now known as jonher		21:28
*** yoctozepto8 is now known as yoctozepto		21:28
ianw	clarkb: if you still have that window open, you can remove the buildkit nodes, the gerrit 3.4 one i still have up for poking at	21:56
ianw	(otherwise i'll do it later)	21:56
fungi	ianw: i'll clean them up	21:58
fungi	and thanks!	21:58
fungi	hope your saturday isn't intolerable	21:58
ianw	so far so good :)	21:59
clarkb	fungi: thanks.	21:59
clarkb	I'm about to do the statsd update on the zks	21:59
fungi	i'm trying to figure out why setuptools seems to have suddenly become uninstallable in tripleo and devstack jobs as of about an hour ago	22:00
ianw	looks like we've got a 9-stream image ready in rax-ord, so that's good	22:00
ianw	friday night, let's release!	22:01
fungi	ianw: the 9-stream work broke centos-7 image builds, there's a simple fix up	22:01
* fungi checks		22:01
clarkb	fungi: if you have a link to one of those failures I can take a look as well once zk stats are done	22:01
fungi	https://review.opendev.org/816813	22:01
fungi	it's project-config, so shouldn't block dib releases	22:02
fungi	clarkb: there are several linked in #openstack-infra in the last few minutes	22:02
ianw	yeah, that looks fine. we probably need to think about that matching for 10-stream, but one thing at a time	22:02
fungi	i was worried it could be related to the pbr bump, but seems to have started up hours after	22:02
clarkb	ah cool I haven't cauhgt up on all irc channels yet	22:03
clarkb	fungi: ya the pbr bump was from wednesday	22:03
clarkb	or early yseterday. It doesn't pyproject.toml properly from what I've seen but existing users should be fine	22:03
fungi	the reqs update for the pbr bump merged around 1700 today	22:03
fungi	but still far earlier than these errors began, it seems	22:03
clarkb	pbr is setup_requires so requirements shouldn't really affect it much	22:04
fungi	right	22:04
fungi	ianw: the nodes held for buildkit debugging are deleted now	22:05
clarkb	ok I'm happy with zk stats. Updated the container on all three nodes and still getting data on grafana dashboards	22:05
fungi	awesome	22:05
fungi	oh, that reminds me, i was going to check on that rax ticket about the cacti db	22:05
fungi	i'll try to take a look at that shortly	22:06
opendevreview	Merged openstack/project-config master: Fix haveged installation in CentOS7 https://review.opendev.org/c/openstack/project-config/+/816813	22:12
fungi	okay, so there were two tickets, one for the db behind wiki.openstack.org and one for the db behind cacti.openstack.org	22:58
fungi	both services seem to be working fine now though	23:01
corvus	my current plan is to restart zuul on master tomorrow morning and try a 2nd scheduler again. i'll let it run as long as it runs without errors. if there's an error, i'l triage it and may try to roll forward if it seems tractable.	23:17
corvus	(otherwise clear state and roll back to .4)	23:17
fungi	i expect to be around, so happy to help or just keep an eye on it	23:17
corvus	i think it'll be interesting either way :)	23:18
fungi	yep!	23:18
fungi	seems like this is getting really close	23:18
corvus	yep i think so	23:19
Clark[m]	Sounds good. Not sure how much I'll be around though	23:30

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!