Thursday, 2022-08-11

ianw	rocky 9 is still in a failure loop	00:08
ianw	its still looking for ntpdate ntp ntp-perl	00:09
ianw	either the project-config chagne isn't rolled out, or something else is going on	00:09
ianw	i think its the former	00:10
clarkb	did the change to how we deploy project-config land? that was one of my concerns with it that somehow we'd break pushing project-cpnfig out	00:11
ianw	oh, heh, it would help if the change was actually merged	00:11
ianw	https://review.opendev.org/c/openstack/project-config/+/852518	00:11
clarkb	oh heh I can review that	00:11
clarkb	doesn't https://review.opendev.org/c/openstack/project-config/+/852518/2/nodepool/elements/infra-package-needs/install.d/10-packages change from not matching 9-stream to matching 9-stream?	00:12
clarkb	oh hrm that is what the ps1 comments are about	00:13
ianw	actually i think you're right	00:14
ianw	that should be a !	00:14
clarkb	ya I just tested it locally and 9-stream =~ '9' evaluates to true under [[	00:15
ianw	and i need to turn it around in the other file too...	00:15
opendevreview	Ian Wienand proposed openstack/project-config master: nodepool: update package maps for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852518	00:17
fungi	oh whoops	00:18
clarkb	ianw: I think that update forgot the ! on the second file?	00:19
ianw	for that one we want to match 9 and exit, right?	00:19
clarkb	oh wait no its exit 0 in that condition to skip	00:19
opendevreview	Merged opendev/system-config master: system-config-run: bump base timeout to 3600 https://review.opendev.org/c/opendev/system-config/+/852479	00:19
opendevreview	Merged opendev/system-config master: Also pin pip/setuptools when creating Xenial venvs https://review.opendev.org/c/opendev/system-config/+/852786	00:19
clarkb	ya sorry we want similar behavior but acheve it by doing the inverse in the block so the condition should be inverted	00:19
fungi	and yay things finally merged	00:21
ianw	cool. i'll remove the problem venvs, and merge the borg 1.1.18 update, and watch that it deploys	00:23
ianw	it's really only storyboard01 and translate01	00:27
ianw	lists.openstack.org is totally non-puppeted now, isn't it?	00:27
clarkb	ianw: correct	00:28
fungi	ianw: cacti is our only other bionic server	00:31
fungi	and i guess we don't back it up	00:31
ianw	s/bionic/xenial/ right?	00:31
fungi	yes, sorry, i meant xenial	00:31
ianw	i guess that is on the chopping block for Prometheus	00:33
fungi	right	00:33
fungi	storyboard we still need to add ansible to deploy the containers we build	00:33
fungi	and zanata is... well... zanata just is	00:34
opendevreview	Merged openstack/project-config master: nodepool: update package maps for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852518	00:40
ianw	it does seem like http://mirror.facebook.net/centos-stream/9-stream/BaseOS/x86_64/os/repodata/ doesn't have the same contents as http://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/repodata/	01:10
ianw	Packages/grubby-8.40-59.el9.x86_64.rpm also doesn't exist on the fb mirror, one of the recent updated packages	01:12
ianw	it is @ http://dfw.mirror.rackspace.com/centos-stream/9-stream/BaseOS/x86_64/os/Packages/grubby-8.40-59.el9.x86_64.rpm	01:14
ianw	I sent mail to mirror-admin@lists.fedoraproject.org on 14th april about rsync not working on the rax mirrors	01:20
ianw	i never got a response, but i'm guessing someone fixed something	01:21
ianw	i just got a 502 from gerrrit :/	01:26
ianw	logs have [2022-08-11T01:28:14.933Z] [HTTP-15318535] WARN org.eclipse.jetty.util.thread.QueuedThreadPool : QueuedThreadPool[HTTP]@35bfa7be{STARTED,20<=100<=100,i=0,r=10,q=200}[ReservedThreadExecutor@723aea0f{s=6/10,p=1}] rejected Accept@b7fa454[java.nio.channels.SocketChannel[connected local=/127.0.0.1:8081 remote=/127.0.0.1:50968]]	01:28
ianw	that starts at [2022-08-11T01:26:58.960Z]	01:30
ianw	at [2022-08-11T01:25:48.306Z] we had RROR com.google.gerrit.httpd.GitOverHttpServlet.GerritUploadPackErrorHandler : Internal error during upload-pack from Repository[/var/gerrit/git/openstack/tacker.git] [CONTEXT project="openstack/tacker" request="GIT_UPLOAD" ]	01:31
ianw	it's over a minute later, so seems unlikley to be related	01:31
ianw	nothing in dmesg	01:32
ianw	3543612 gerrit2 20 0 121.3g 106.3g 60608 S 30.5 84.5 108119:50 java	01:33
ianw	nothing crazy in cpu/memory usage	01:33
*** rlandy\|bbl is now known as rlandy		01:33
*** rlandy is now known as rlandy\|out		01:36
opendevreview	Merged opendev/system-config master: install-borg: update to borg 1.1.18 https://review.opendev.org/c/opendev/system-config/+/852488	01:38
ianw	well it seems alive again	01:43
ianw	the error is an overflow of the HTTP incoming requests queue (httpd.maxQueued on your gerrit.config)	01:46
ianw	which seems about right, ssh was working	01:47
Clark[m]	Gerrit is rejecting new connections and Apache returns 502?	01:47
ianw	in summary i'd say, yep	01:48
Clark[m]	We might be able to up those limits given the larger server size.	01:48
Clark[m]	But I guess we need to see if someone needs to be pointed at gitea too	01:49
ianw	in the httpd gerrit logs we have	01:50
*** ysandeep\|out is now known as ysandeep		01:50
ianw	an entry at 2022-08-11T01:26:37.694Z then the next at 2022-08-11T01:28:18.204Z	01:50
ianw	the queuedthreadpool thing started at 01:28:14	01:51
Clark[m]	Cacti points to a spike in tcp connections but other system resources don't seem to follow	01:51
Clark[m]	I wonder if we have some other pause event (gc?) That causes tcp to backup	01:52
Clark[m]	But if ssh was fine that doesn't seem to line up either	01:52
ianw	# cat sshd_log \| grep 2022-08-11T01:26 \| grep sf-project-io \| grep LOGIN \| wc -l	01:55
ianw	398	01:55
Clark[m]	If/when it happens again dumping cache stats which includes jvm internal memory info may be helpful	01:55
ianw	it seems like sf-project-io logged in 398 times in one minute	01:56
Clark[m]	We may be in a gc loop or maxed out in the jvm on memory	01:56
Clark[m]	Oh weird	01:56
ianw	i wonder if that ran us out of tcp ... something and ended up hanging the webserver bits	01:56
Clark[m]	And that would push us over the limit. We have limits on the number of connections per IP and per user but maybe they aren't sufficient here for some reason	01:56
ianw	that is the most suspicious thing i can see	01:58
Clark[m]	Maybe Tristan can help track down what might've caused that on their end	01:59
ianw	tristanC: ^ any idea why this would have had a big loop of logins at this time?	01:59
ianw	in better news storyboard & translate have functioning borg venvs now, with 1.1.8	02:06
opendevreview	Merged opendev/system-config master: letsencrypt-acme-sh-install: handle errors better in driver https://review.opendev.org/c/opendev/system-config/+/696211	02:47
*** pojadhav\|out is now known as pojadhav\|rover		02:49
opendevreview	Merged opendev/system-config master: letsencrypt: make acme.sh exits clearer https://review.opendev.org/c/opendev/system-config/+/850435	02:49
*** ysandeep is now known as ysandeep\|afk		02:57
*** ysandeep\|afk is now known as ysandeep		03:10
*** ysandeep is now known as ysandeep\|away		03:25
opendevreview	Ian Wienand proposed opendev/system-config master: system-config-run-borg-backup: rename hosts to distro https://review.opendev.org/c/opendev/system-config/+/852685	03:33
opendevreview	Ian Wienand proposed openstack/project-config master: infra-package-needs: blank out coreutils for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852798	03:38
opendevreview	Merged openstack/project-config master: infra-package-needs: blank out coreutils for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852798	03:59
*** ysandeep\|away is now known as ysandeep		04:13
ianw	hrm, https://review.opendev.org/c/opendev/system-config/+/852799 is there, but wasn't announced ^	04:14
ianw	and i don't think gerrit has picked it up	04:14
ianw	s/gerrit/zuul/	04:18
ianw	[2022-08-11T04:00:19.374Z] [SSH git-receive-pack /opendev/system-config.git (iwienand)] WARN com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker killed after 245282ms, cancelled (timeout=5282ms, task=RECEIVE_COMMITS(Processing changes))	04:22
ianw	guess what was happening at 04:00...	04:23
ianw	cat sshd_log \| grep 2022-08-11T04:01 \| grep sf-project-io \| grep LOGIN \| wc -l	04:24
ianw	98	04:24
ianw	it's more spaced out, but a lot of sf-project-io logins	04:24
opendevreview	Merged opendev/system-config master: system-config-run-borg-backup: add jammy test host https://review.opendev.org/c/opendev/system-config/+/852489	05:32
opendevreview	Merged opendev/system-config master: gate-groups: remove old backup group https://review.opendev.org/c/opendev/system-config/+/852684	05:36
*** marios is now known as marios\|ruck		05:38
*** ysandeep is now known as ysandeep\|afk		06:04
ianw	2022-08-11 06:16:57.023 \| Build completed successfully ... rocky 9 finally worked	06:22
*** ysandeep\|afk is now known as ysandeep		06:45
*** jpena\|off is now known as jpena		06:57
*** ysandeep is now known as ysandeep\|lunch		08:21
*** tosky_ is now known as tosky		09:33
*** ysandeep\|lunch is now known as ysandeep		10:12
*** rlandy\|out is now known as rlandy		10:38
*** ysandeep is now known as ysandeep\|afk		10:55
*** dviroel\|out is now known as dviroel		11:31
*** ysandeep\|afk is now known as ysandeep		11:45
tristanC	ianw: i guess it's caused by a zuul's periodic trigger event. Is there something we could do about it?	12:23
*** efoley_ is now known as efoley		12:33
fungi	tristanC: maybe some of the newer jitter options for the timer trigger, if your zuul is new enough?	12:48
tristanC	fungi: thanks, i guess we'll need to update zuul to 6.2.0	13:16
fungi	that's just one idea, others may have different suggestions	13:20
*** rcastillo is now known as rcastillo\|rover		13:57
Clark[m]	tristanC: fungi: opendev's periodic jobs don't seem to cause the same issues. Maybe compare the pipeline definitions and zuul connection settings for Gerrit?	13:58
*** pojadhav\|rover is now known as pojadhav\|afk		14:19
fungi	setuptools 64.0.0 is out, as is pbr 5.10.0	14:23
fungi	so keep an eye out for anything possibly related	14:25
fungi	https://setuptools.pypa.io/en/latest/history.html#v64-0-0	14:25
fungi	the big change in setuptools is the pep 660 editable installs implementation	14:25
fungi	so it could impact testing if projects have tox set to do editable by default, though the new stuff should only actually kick in for projects also doing pep 517 builds (via pyproject.toml configuration)	14:27
fungi	"Added ability of collecting source files from custom build sub-commands to sdist. This allows plugins and customization scripts to automatically add required source files in the source distribution."	14:28
fungi	that might actually come in handy for pbr	14:28
JayF	oh that's awesome, I've been waiting for setuptools to get that editable implementation	14:31
Clark[m]	tristanC: fungi: another thing we should rule out is sf hitting the login limit which causes immediate retries and a thundering herd. I don't have evidence of this just occurred to me that it could happen if the software retries aggressively	14:35
*** pojadhav\|afk is now known as pojadhav		15:14
*** marios\|ruck is now known as marios\|out		15:30
*** pojadhav is now known as pojadhav\|out		15:31
clarkb	'failed to open /etc/mailman/sites for linear search: No such file or directory' is the exim error for ianw's mm3 signup problem. I think that is an artifact of our old vhosting setup under mm2. I'll take a look at cleaning that up. I also notice that we have errors creating xapian indexes due to user in the upstream containers not aligning with our containers	15:33
clarkb	er not aligning with our hosts. I think that the expectation is those containers do start as root then they change their process ownership to the baked in mailman user. But its uid is 100 :/	15:34
clarkb	I'm not sure what the best way to handle that is. Maybe we can bind mount an /etc/passwd that changes the uid to align with what we want?	15:34
fungi	yes, we have a custom exim router on our mm2 servers to look up which mailing list chroot to use based on the domain	15:34
fungi	mm3 shouldn't need that	15:34
clarkb	oh to make the uid/gid situation works the mailman-web and mailman-core gids are different	15:36
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248	15:48
clarkb	fungi: something like ^ based on the contents of: https://github.com/maxking/docker-mailman/tree/main/core/assets/exim ?	15:48
clarkb	I'm going to cycle out the held node for ^	15:48
*** ysandeep is now known as ysandeep\|dinner		15:48
clarkb	old hold deleted and new one created	15:49
fungi	clarkb: yeah, the simple config examples there ought to be valid for our case as well	15:49
clarkb	I'm still not sure what the best way to address the user mismatch is ebtwene host and containers	15:50
clarkb	we could configure docker to do the user offset thing and then set perms outside the container appropriately	15:50
clarkb	We could build our own image based on the usptream iamge that changes the uid and gid and chowns everything	15:50
clarkb	But one problem at a time :)	15:51
clarkb	apparently making a new docker image to update /etc/passwd and /etc/group and chowning everything is a common choice here :/	15:54
clarkb	I guess if ^ works then my change may actually send some emails now. To nobody@openstack.org and test@example.com. That should be fine	15:59
fungi	i thought we added firewall rules to prevent those from actually being delivered	16:00
fungi	if memory serves, i did that a while back when we were increasing our test coverage for ml server deployments	16:01
fungi	so that we could safely exercise the mta	16:01
clarkb	if we did I'm not finding them	16:03
clarkb	we set the "don't send email" flag on the lsit creation command under mm2	16:03
*** dviroel is now known as dviroel\|lunch		16:04
clarkb	we run the service-lists.yaml playbook in the mm2 job and it creates the mailing lists. That test has template files for host vars for lists.o.o and lists.kc.io but neither seem to set up special iptables rules	16:04
clarkb	I guess we could've done it globally on all test nodes /me looks	16:04
fungi	mmm, yeah i'm trying to find where/when i remembered doing tat	16:05
clarkb	aha that is where the rules live	16:05
clarkb	system-config/playbooks/zuul/templates/group_vars/all.yaml.j2	16:05
fungi	aha, yep that was last year in https://review.opendev.org/820900	16:05
clarkb	looks like we allow the port 25 connection over localhost but then reject it otherwise	16:05
fungi	right, that way mailman can send to exim, but exim can't send out	16:06
clarkb	that explains why ianw's emails hit exim but if the config wasn't broken would've been blocked from there	16:06
clarkb	ok cool	16:06
fungi	i added it precisely for this case, and in preparation for the mm3 work	16:06
clarkb	in that case my fixed up change should have errors in exim sending email out, but not config related errors	16:07
opendevreview	Merged openstack/project-config master: End project gating for openstack-helm-addons https://review.opendev.org/c/openstack/project-config/+/851857	16:07
clarkb	fungi: I guess if/when we want to test email we'd do that in a controleld setting removing the iptables rule and then trying to sign up with one of our email addrs?	16:08
clarkb	similar to when we'd like to test a list's behavior	16:08
fungi	right	16:09
fungi	that way it's explicitly under our control	16:09
fungi	down the road, if we wanted automated testing for something like that in a job, we could add pass rules in the firewall for another job node	16:10
clarkb	I guess we have to be careful that any buffered messages don't all get out when we drop the iptabltes rule too	16:10
*** jpena is now known as jpena\|off		16:10
fungi	yes, `exim4 -Mrm ...` should allow us to delete them	16:11
fungi	`exim4 -bp` to list	16:11
clarkb	cool	16:11
*** ysandeep\|dinner is now known as ysandeep\|out		16:18
clarkb	ok job for that latest patchset completed and the node is held (104.130.172.61). There is only an exim mainlog. No rejectlog (or error log I forget what the full set is). That implies to me that maybe we didn't try to send email at all?	16:24
*** gibi is now known as gibi_pto		16:24
clarkb	`exim4 -bp` returns no results fwiw	16:25
clarkb	I guess the next steps are probably to continue trying to sign up for an account on the server through the web ui and see what that does as far as sending email. Then create a mailing list with our account as owner to see if we get emailed?	16:26
clarkb	ok new error: sender verify fail for <postorius@lists.opendev.org>: Unrouteable address	16:33
clarkb	I guess I need to update the exim config to make that valid?	16:33
clarkb	something in the mailman_verp_router I expect. But I'm don't understand it well enough to know what we should change	16:34
clarkb	senders = "-bounces@" <- does that need to be updated?	16:35
clarkb	also feel free to update the change. I think you undersatnd this stuff a lot better than I do	16:36
clarkb	and ya the rejectlog existing now after my error would imply to me that maybe we aren't trying to send email when creating lists or adding list owners. Still needs better testing of that behavior, but that is encouraging	16:37
fungi	yeah, so exim is currently configured to verify sender addresses on receipt	16:43
fungi	need to think about what the postorius address is used for and how we'll route it	16:43
clarkb	fungi: that error was generated by me trying to sign up for an account on the test server. It is used to send the email verification message at least	16:45
fungi	yep, just wondering what else it might get used for	16:45
fungi	as soon as i get lunch cleared away i'll check the docs to see if they say	16:46
clarkb	good point. The example exim config form the docker image repo shows -bounces -etc seem to align with mm2	16:46
clarkb	so I don't think it is used for list management. But it could be that is an incomplete listing	16:46
fungi	looking at mm2 messages, we get notifications from mailman-owner@ for creation of new lists, from $foo-request@ when subscribing to a new list	16:52
fungi	system-wide account creation doesn't really fall into the same sort of category though	16:52
clarkb	the new require_files I pushed is buggy too I just realized. I gave it the in container path of the bind mount but exim is external to the containers so it needs the host side of the bind mount path	16:54
fungi	web searches would work better if i didn't constantly try to add a third "o" to "postorius"	16:54
fungi	i'll have to get back to this after lunch though	16:55
*** dviroel\|lunch is now known as dviroel		16:55
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248	16:57
clarkb	that fixes the bind mount side confusion	16:57
opendevreview	Merged openstack/project-config master: Add official-openstack-repo-jobs to openstack k8s charms https://review.opendev.org/c/openstack/project-config/+/852117	16:59
clarkb	corvus: re software factory hitting review.opendev.org with a bunch of connections all at once, is there anything that tristanC should be looking at other than pipeline jitter for periodic jobs? I don't think that openstack periodic jobs have similar issues, and they run quite a number of jobs too. Is it possible that node request timing delays in opendev are naturally breaking up	17:23
clarkb	those requests for git repos maybe? We merge before the node is ready but once an exceutor is assigned and nodes are ready we do a merge again for the job itself	17:23
clarkb	If sf is running all of those jobs out of containers that don't have provisioning delays I wonder if we're compacting all the requests into a much shorter period of time than in opendev	17:23
fungi	oh, i hadn't made that connection, but yeah that would make sense	17:29
clarkb	iirc the flow is event comes in and we do a merge for each even to determine which jobs to run. Then when job is ready to run on an executor that executor does the merge again for each job. In opendev the randomness of node provisioning liekly acts as a good smoothing system for load on gerrit requests. But if your provisioning is near instant because it is effectively already	17:31
clarkb	provisioned and you are just reserving a slice of it then you'll potentially generate a lot of merges in a much shorter period of time	17:31
fungi	the mm3/postorius docs are rather circular	17:38
fungi	postorius: "for config instructions see mm3 docs"	17:39
fungi	mm3: "postorius is the management interface, see its docs for further info"	17:39
fungi	https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/docs/postorius.html	17:40
clarkb	The mm3 docs have been painful to work with. Their rest docs don't really show you much about the requests and hide it all behind helper functions they've written	17:43
clarkb	Which is fine to also document. But the API should be more explicit imo	17:43
fungi	the main thing i want to figure out is if there is any expectation that you can send control messages to the postorius address, or whether we need to just nullroute it	17:48
clarkb	Based on the exim4 configs in the mm3 docker repo I expect we can nullroute it otherwise they would've called the addr out there	17:49
fungi	that's my assumption too, just trying to find any indication that it could be a bad assumption	17:49
fungi	looking at acl_check_rcpt in our current exim4.conf, we automatically accept messages with a null host (comments say that's local injection), and also anything for postmaster at local domains, then apply sender verification to anything else	17:55
fungi	i guess messages originating from within containers may be breaking that assumption and we might need to add some additional exceptions for loopback or whatever?	17:57
clarkb	is it looking at the tcp connection details or the sender in the headers?	17:58
clarkb	in this case it should be a local tcp connection	17:58
clarkb	the error specifically says sender <postorius@lists.opendev.org> couldn't be verified which makes me think the issue is actually at a header level?	17:59
clarkb	(if it wasn't a local tcp connection then our iptables rule would've blocked it)_	17:59
fungi	well, the address it's verifying is the sender specified in the smtp protocol header, yes	18:02
fungi	the exclusions before that verify rule could be based on tcp connection information though	18:02
fungi	the "require verify = sender" line in acl_check_rcpt:	18:05
fungi	the main exclusion before that rule seems to be "accept hosts = :" which the comments say is checking for local mail injection, but i guess that's from calling /usr/lib/sendmail or the like	18:06
fungi	otherwise the host would be 127.0.0.1 or ::1	18:06
fungi	or localhost or something like that	18:07
clarkb	the comment says " Accept if the source is local SMTP (i.e. not over TCP/IP)."	18:07
clarkb	that would be dropping files into the correct location? This is smtp though over localhost:25	18:07
clarkb	oh and localdomains would be lists99.opendev.org not lists.opendev.org or lists.openstack.org etc	18:08
clarkb	We could set exim_local_domains? But I'm not sure that is the right way to solve this. Might be better to make the address verifiabe, but I'm not sure what that requires	18:09
fungi	probably adding postorius to /etc/aliases would suffice	18:10
fungi	we could manually stuff it into /etc/aliases on the real lists.o.o if we wanted, assuming the test server doesn't consider it to be local	18:12
clarkb	is the test server querying exim on the prod lists.opendev.org server to verify that?	18:12
fungi	in theory, yes, unless it thinks lists.opendev.org is a local domain	18:13
clarkb	hrm we add lists.opendev.org to mm_domains which is added to exim_local_domains	18:15
clarkb	oh but we only allow postmaster at the local domain	18:16
clarkb	so ya I think it would be sufficient to add it to /etc/aliases on the test server	18:16
*** tosky_ is now known as tosky		18:16
fungi	yeah, domainlist localdomains in the exim4.conf has it	18:17
fungi	er, local_domains	18:17
clarkb	fungi: https://review.opendev.org/c/opendev/system-config/+/851248/56/playbooks/zuul/files/host_vars/lists99.opendev.org.yaml line 42 or so we add an entry? I'm not sure what we should alias it to though	18:18
fungi	we could alias it to one of the magic addresses like :fail:	18:19
clarkb	fungi: something like: ' postorius: :fail: Outgoing email only from this address' ?	18:20
fungi	it's worth a try. you could add it locally on the server and test	18:20
clarkb	ok let me try that	18:20
fungi	just stick that line in /etc/aliases	18:20
clarkb	heh "a user is already registered with this address" Not a godo failure method if the server 500s but creates the records anyway. /me tries another address	18:22
fungi	could probably also delete the broken account	18:22
clarkb	sender verify fail for <postorius@lists.opendev.org>: outgoing email only from this address	18:23
fungi	though also, maybe password recovery process works with the account	18:23
clarkb	that means /etc/alias did modify the behavior but :fail: isn't what we want	18:23
fungi	okay, so 1. we know it's checking the local delivery, and 2. we can't :fail: it	18:23
fungi	:blackhole: maybe?	18:24
fungi	https://www.exim.org/exim-html-3.20/doc/html/spec_23.html#SEC634	18:24
corvus	clarkbtristanC jitter would be the main thing. https://review.opendev.org/848516 could potentially help in some cases (less likely to help with periodic jobs, but otherwise very good for a 3pci system to cut down on its impact).	18:24
clarkb	fungi: that worked. It claims email was sent to me	18:25
fungi	interesting!	18:26
corvus	blackhole is ~= delivering to /dev/null	18:26
corvus	successfully	18:26
fungi	yeah, mainly just confirming that's effective for bypassing sender verification	18:26
clarkb	corvus: ya in this case mm3 uses postorius@listdomain to send email verification emails to people doing account signups	18:26
clarkb	corvus: those emails were getting rejected on sender verification by exim. Will it blackhole the email even if it is the sender? or just if it is the recipient?	18:27
corvus	just rcpt	18:27
clarkb	I didn't get the email but didn't expect to due to our iptables rules so hard to say why it wasn't delivered.	18:27
clarkb	Ah ok so thsi is probably workable (for now anyway)	18:27
clarkb	as an alternative I could alias it to mailman which is a local user	18:28
fungi	a cleaner option might be to configure exim to skip checking the postorius sender address and then add an explicit :fail: message for it for cases where users try to send something to or reply to that address	18:28
fungi	rather than accepting messages for it and throwing them away	18:28
fungi	but this is sufficient for now	18:28
clarkb	I can add a TODO to the yaml config to improve the situation. Will push a patch up shortly	18:28
clarkb	in the exim mainlog I see it failing to connect to send that email indicating our iptabltes rules are working	18:29
fungi	perfect. broken as designed! ;)	18:29
clarkb	and there are definitely no logs like that for nobody@openstack.org or test@example.com implying that creating lists and adding owners isn't generating emails to them	18:30
clarkb	But I want to do mroe testing of that before we roll with it	18:30
fungi	yeah, for the test list creation we pass a flag to tell it not to send a notification	18:31
fungi	or at least we did at one point	18:31
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248	18:32
clarkb	fungi: thats a mm2 only thing as far as I can tell	18:32
fungi	oh, got it	18:32
clarkb	the mm3 rest api doesnt appear to have toggles for that sort of thing	18:32
clarkb	I'm going to update the hold to catch that latest update	18:33
clarkb	it might also be some sort of situation where if an account matching that email address exists and the email addr is verified then you'll get emailed	18:34
clarkb	But when boot strapping lists like this with no accounts it won't send email to unverified addresses	18:35
clarkb	If that is the case then the next thing we need to sort out is whether or not adding an account for that email address later will properly associate the ownership to the list	18:35
clarkb	I think we can test this by adding creating a user to match one of those email addresses, then using django admin interface to manually verify the meail then see if we are a list admin	18:36
tristanC	corvus: clarkb: ianw: ok thank you, we'll update zuul and setup the jitter settings to reduce the number of connections opened by sf-project-io	18:38
clarkb	fungi: I've got to do lunch and drop the kids off after, but is there a way to read the unsent email off disk from exim? If so I should be able to follow the link that postorius sends out to verify my account that way without the email actually getting deliveryed	18:55
corvus	clarkb: /var/spool/exim	18:57
clarkb	corvus: thanks!	18:58
corvus	there's a pair of files for each msg in the queue (suffixed with -D and -H)	18:58
fungi	yep, that exactly	18:58
corvus	filename is queue id which shows up in the logs	18:58
fungi	also the same queue id which shows up in `exim4 -bp` or `mailq` output	18:59
fungi	in a parallel directory you'll also find an ephemeral delivery log for that queue item	19:00
clarkb	great that should allow me to check account verification without worrying about spam filters and iptables.	19:01
clarkb	And then after that I think I'll try creating another list and make myself an owner and see if it tries to send email to me	19:02
clarkb	and maybe also create another list for an uncreated user and see if it tries to send email to them	19:02
fungi	yeah, i agree, not sending to addresses which don't have an account may be a safety measure in order to avoid becoming as much of a potential spam source	19:03
fungi	i think bulk account creation is part of the list subscriber migration	19:03
fungi	idea being people will get new mm3/postorius accounts the first time a list they're subbed to is imported, and if they want to use it they can do a password reset dance through the webui	19:04
fungi	we would presumably do the same for list owners/moderators	19:06
clarkb	it definitely lets me add an owner email address when that email addr doesn't have an associated account yet	19:06
clarkb	that is what the current change does, it sets owner to test@example.com	19:06
fungi	hopefully once the account for that address is created the user will be able to manage the list	19:06
clarkb	yup exactly	19:07
*** dviroel is now known as dviroel\|afk		19:07
fungi	and unlike mm2, no more shared passwords for owner/mod access, it's just associated with your account (so you have a login to postorius which gets you access to manage all the lists you're associated with, as well as your subscription settings for any to which you're subscribed)	19:09
clarkb	158.69.70.114 is the new host fwiw	19:14
opendevreview	Davlet Panech proposed openstack/project-config master: Add starlingx/jenkins-pipelines repo https://review.opendev.org/c/openstack/project-config/+/852919	19:47
clarkb	I have successfully created an account on that server by grabbing the confirmation url out of /var/spool/exim4/input	20:21
clarkb	I just tested that a single account is valid across multiple vhosts. Though I had to log in again	20:22
clarkb	(the cookies are domain scoped I guess so that makes sense)	20:23
fungi	perfect. and yeah that's something i'd tested in the original poc as well but good to see the new orchestration and container bits don't change the behavior	20:23
fungi	does creating a new list with your account as the owner generate a notification?	20:25
clarkb	I haven't managed that yet. Need to figure out how to do it from curl	20:25
clarkb	but that is next	20:25
fungi	the cli tools don't work any longer?	20:28
fungi	i thought i'd used those in an earlier 3.x	20:29
clarkb	I'm sure tehy do but I hate them	20:33
clarkb	mostly because they thin that documenting the rest api means "here's some python to run"	20:33
clarkb	and they aren't even commands its, fire up and interpreter and import some stuff and call a function	20:33
clarkb	I find it extremely clunky and far prefer using something like curl for things like this	20:34
clarkb	I should be able to understand your api without firing up a python interpreter	20:34
fungi	oh, i meant the actual executable scripts like `newlist`	20:35
clarkb	fungi: I didn't realize those existed all the docs examples have you import something.somethingelse.cli	20:37
clarkb	then you call functions out of the cli that way	20:37
fungi	well, docker-compose exec doesn't seem to be viable with these anyway	20:37
clarkb	it isn't interactively since that bug	20:38
clarkb	it did work last week but then runc broke us	20:39
clarkb	anyway list has been created and I've been added as an owenr	20:39
fungi	oh, i can actually run some things, looks like	20:39
fungi	no, nevermind, i faked myself out	20:39
fungi	OCI runtime exec failed: exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown	20:39
clarkb	It did not send me an email	20:39
clarkb	yes thats the runc bug	20:39
clarkb	https://github.com/opencontainers/runc/issues/3551	20:40
fungi	ick	20:40
clarkb	anyway it works if you drp -t and just do random commands but that won't work with the mailman3 examples of running an interpreter interactively	20:41
clarkb	its fine curl works great :)	20:41
clarkb	but ya no emails generated from that. In the web ui I see different options for the list I own now too so that bit seemed to work	20:41
clarkb	Looks like it doesn't auto sub you to a list when you are owner. But I think that is fine	20:41
fungi	pretty sure mm2 never did either	20:42
fungi	subscribe the owner i mean	20:42
clarkb	now the thing to do is for me to create a user for test@example.com and see if that new user is associated with ownership on the existing list	20:42
clarkb	heh ok that didn't work because test@example.com has no smtp service so exim completed that delivery properly without spooling things?	20:45
clarkb	I'll create a third list with a third email addr as owner and then sign up for that	20:45
clarkb	yup when I make a user and verify the email and login I'm an owner for the list that was precreated with my email as owner	20:49
fungi	$ host -t mx example.com	20:49
fungi	example.com mail is handled by 0 .	20:49
fungi	that may result in some strange mailrouting	20:49
clarkb	fungi: exim just says "I'm done don't need to do anything"	20:49
clarkb	anyway I think this confirms a few important details of mailman3 behavior for us	20:49
clarkb	first is that creating a list and adding an owner to it (at least via the rest api) does not spam the owner. Second if we set the owner before an account exists for that email addr it will auto associate the ownership with that email once the account is created and email is verified	20:50
clarkb	that means we should be able to populate the existing change with all of our lists and set the owners properly in testing now	20:50
fungi	thinking about it a bit more, the notifications are probably no longer necessary anyway, because there's no precreated admin password to send them	20:51
fungi	so this is just fine and dandy	20:51
clarkb	fungi: any objections to me updating the change with all of our current lists and setting the owner to the actual owners given ^?	20:53
fungi	no objection on my part	20:53
fungi	especially since outbound smtp will be blocked from the test node initially anyway, just in case	20:53
clarkb	++	20:54
clarkb	fungi: in mm2 there are mailman@domain lists. Any idea if we still need those?	20:59
clarkb	I'm not sure I ever really understood the functionality of those lists	21:00
opendevreview	Jeremy Stanley proposed openstack/project-config master: Add #openstack-latinamerica to accessbot https://review.opendev.org/c/openstack/project-config/+/852922	21:06
opendevreview	Jeremy Stanley proposed opendev/system-config master: Add IRC logging for #openstack-latinamerica https://review.opendev.org/c/opendev/system-config/+/852923	21:06
fungi	clarkb: those lists are used primarily for the monthly password reminders we disable, but also for things like owner e-mail i think? anyway pretty sure mm3 does not need them	21:08
clarkb	cool I'll go ahead and remove them	21:11
fungi	also when we're getting close to migrating any list domains, we should take the opportunity to check whether there are any more we can/should retire in order to avoid migrating more than necessary	21:13
clarkb	I was wondering about that recently. What is our plan for archives of old lists? DO we need to end up creating them just to give them somewhere to live?	21:15
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248	21:17
fungi	nope, we can just copy the old pipermail archive tree straight over. for both retired and active lists. that keeps the old urls for the archives working	21:17
clarkb	I'm going to swap out the holds for ^	21:17
fungi	for active lists we'd also import the archives, but keep the old pipermail copies served as well since the url patterns differ and people have linked to them all over	21:18
clarkb	but that adds all the lists to the testing node, removes the password attribute for lists and renames admin to owner to align with the api better	21:18
clarkb	http://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fac/851248/58/check/system-config-run-lists3/fac28b8/bridge.openstack.org/screenshots/mm3-openstack-main.png that looks good. I checked the exim spool and logs to be extra sure no emails were attempted and it looks good	21:48
fungi	awesome!	21:49
clarkb	I'm going to take a break, but when I get back I'm going to put mailman down for now. The next thing is going to be fussing with users and file perms and all that to see if I can make xapian happy in a reasonable way	21:49
clarkb	and that I expect to be quite consuming	21:49
fungi	having dealt with xapian in a desktop setting years hence, i don't believe it will ever truly be happy	21:50
clarkb	heh. In this case it is beacuse the mailman containers run as uid 100 but I've set things up to have uid something elseo n the host side. I don't really want to use 100 on the host side because thats _apt or some such	21:51
clarkb	I think what we need to do is find a way to change the uid on the container side (bind mounting in /etc/passwd or doing our own images that update the upstream ones)	21:51
clarkb	none of the options here are particularly good. I'm a bit suprised that the upstream images use such a low uid considering it is almost always going to collide with something on the host system	21:52
clarkb	anyway 23.253.108.60 is the most recently held node if you want to look at it	21:52
fungi	maybe that's a kubernetes assumption showing through	21:52
clarkb	they use docker-compose actually	21:52
fungi	oh, interesting	21:52
fungi	yeah, very odd choice	21:52
clarkb	fungi: re 23.253.108.60 one of the other things to test on the list is list behavior. Can we send email to subscribers, what about dmarc, what about private lists and so on	21:53
clarkb	not sure if you wanted to poke at that but the node is there and ready for it if you have time	21:53
fungi	yep, i should be able to give that stuff a shot and make some notes/observations, though probably tomorrow morning at this point	21:58
clarkb	in theory we should be able to patch in a lot of those settings if we want to make them consistent. I think mailman also has templates and styles (not sure how they differ) that we might be able to setup then apply to lists as necessary	22:01
clarkb	considering everything is already automated we probably don't need to rely on those features much and can directly configure what we want	22:02
clarkb	heh even the register has picked up on the ssh sha1 problems https://www.theregister.com/2022/08/11/red_hat_ssh/	22:03
*** rlandy is now known as rlandy\|bbl		22:12
opendevreview	Merged openstack/project-config master: Add starlingx/jenkins-pipelines repo https://review.opendev.org/c/openstack/project-config/+/852919	22:21
clarkb	https://github.com/maxking/docker-mailman/blob/main/web/docker-entrypoint.sh#L148-L150 adds more mystery to the user problems. I would've expected that to address it	22:23
clarkb	I think maybe they removed the bit that allowed uid and gid to be configurable based on that comment. But the chown should've made things work for xapian	22:24
clarkb	I wonder if we can just get away with precreating the dir that xapian wants so that privileged chown will apply to it. I suspect the problem is that xapian is trying to create the dir after it has dropped privs	22:26
opendevreview	Merged openstack/project-config master: Add #openstack-latinamerica to accessbot https://review.opendev.org/c/openstack/project-config/+/852922	22:33
fungi	oh, maybe	23:28
opendevreview	Ian Wienand proposed openstack/project-config master: linter: update ansible-lint; add auto-download of roles https://review.opendev.org/c/openstack/project-config/+/851278	23:29
ianw	gosh xapian is something i haven't heard in a long time	23:36
clarkb	ianw: it is the recommended hyperkitty indexer	23:38
clarkb	the default is something else but apparently the default will change in the next release or something and be xapian	23:38
ianw	last time i used it was with a fairly popular moinmoin wiki for Itanium linux development with Gelato@UNSW	23:39
ianw	linux and UNSW still exist, none of the other bits do :)	23:39
fungi	i suppose it's no worse than mediawiki's search plugin using logstash	23:39
fungi	er, elasticsearch i mean	23:40
fungi	i bet xapian is still using an open source license at least ;)	23:40
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248	23:41
ianw	"Change 852922 in project openstack/project-config does not share a change queue with 852923 in project opendev/system-config"	23:41
ianw	on https://review.opendev.org/c/opendev/system-config/+/852923 ... that's a new one?	23:42
clarkb	fungi: ^ I put a hold on that latest patchset but didn't delete the older one as I'm nto sure I won't break things. I think if this newer one deploys you can use it to debug email stuff just fine. Otherwise fallback to the old one and I'll clean up once we know how things look	23:42
clarkb	ianw: the reporting is new but the error is not.	23:42
clarkb	ianw: it just needs a recheck or reapproval once the depend on has merged	23:43
ianw	clarkb: are you ok with https://review.opendev.org/c/opendev/system-config/+/852793 for now?	23:43
clarkb	ianw: ya I tried to make it clear my -1 was more about getting tripleo to think about their needs instead of blindly choosing something. But if the old upstream mirror is up to date switching to it now should be fine	23:43
clarkb	ianw: I was worried that if I +2'd it would've gotten lost	23:43
clarkb	I also responded to the openstack-discuss thread on that pointing out that we do make changes to our mirror backend stuff and don't really consider it to be a public interface	23:44
clarkb	things like converting pypi from bandersnatch to caching proxy and removing all source packages	23:44
ianw	yeah, i agree we want to do a bit of research before just switching blindly	23:44
ianw	in this case i think we have; i've tried to reach out at least	23:45
fungi	ianw: as for the verified -2, it's a side effect of starting to report the approval of a dependent in a non-shared queue instead of completely ignoring the approval event, with unanticipated fallout for the openstack tenant's "clean check" rule	23:50
opendevreview	Merged openstack/project-config master: linter: update ansible-lint; add auto-download of roles https://review.opendev.org/c/openstack/project-config/+/851278	23:56
opendevreview	Merged opendev/system-config master: system-config-run-borg-backup: rename hosts to distro https://review.opendev.org/c/opendev/system-config/+/852685	23:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!