ianw | rocky 9 is still in a failure loop | 00:08 |
---|---|---|
ianw | its still looking for ntpdate ntp ntp-perl | 00:09 |
ianw | either the project-config chagne isn't rolled out, or something else is going on | 00:09 |
ianw | i think its the former | 00:10 |
clarkb | did the change to how we deploy project-config land? that was one of my concerns with it that somehow we'd break pushing project-cpnfig out | 00:11 |
ianw | oh, heh, it would help if the change was actually merged | 00:11 |
ianw | https://review.opendev.org/c/openstack/project-config/+/852518 | 00:11 |
clarkb | oh heh I can review that | 00:11 |
clarkb | doesn't https://review.opendev.org/c/openstack/project-config/+/852518/2/nodepool/elements/infra-package-needs/install.d/10-packages change from not matching 9-stream to matching 9-stream? | 00:12 |
clarkb | oh hrm that is what the ps1 comments are about | 00:13 |
ianw | actually i think you're right | 00:14 |
ianw | that should be a ! | 00:14 |
clarkb | ya I just tested it locally and 9-stream =~ '9' evaluates to true under [[ | 00:15 |
ianw | and i need to turn it around in the other file too... | 00:15 |
opendevreview | Ian Wienand proposed openstack/project-config master: nodepool: update package maps for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852518 | 00:17 |
fungi | oh whoops | 00:18 |
clarkb | ianw: I think that update forgot the ! on the second file? | 00:19 |
ianw | for that one we want to match 9 and exit, right? | 00:19 |
clarkb | oh wait no its exit 0 in that condition to skip | 00:19 |
opendevreview | Merged opendev/system-config master: system-config-run: bump base timeout to 3600 https://review.opendev.org/c/opendev/system-config/+/852479 | 00:19 |
opendevreview | Merged opendev/system-config master: Also pin pip/setuptools when creating Xenial venvs https://review.opendev.org/c/opendev/system-config/+/852786 | 00:19 |
clarkb | ya sorry we want similar behavior but acheve it by doing the inverse in the block so the condition should be inverted | 00:19 |
fungi | and yay things finally merged | 00:21 |
ianw | cool. i'll remove the problem venvs, and merge the borg 1.1.18 update, and watch that it deploys | 00:23 |
ianw | it's really only storyboard01 and translate01 | 00:27 |
ianw | lists.openstack.org is totally non-puppeted now, isn't it? | 00:27 |
clarkb | ianw: correct | 00:28 |
fungi | ianw: cacti is our only other bionic server | 00:31 |
fungi | and i guess we don't back it up | 00:31 |
ianw | s/bionic/xenial/ right? | 00:31 |
fungi | yes, sorry, i meant xenial | 00:31 |
ianw | i guess that is on the chopping block for Prometheus | 00:33 |
fungi | right | 00:33 |
fungi | storyboard we still need to add ansible to deploy the containers we build | 00:33 |
fungi | and zanata is... well... zanata just is | 00:34 |
opendevreview | Merged openstack/project-config master: nodepool: update package maps for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852518 | 00:40 |
ianw | it does seem like http://mirror.facebook.net/centos-stream/9-stream/BaseOS/x86_64/os/repodata/ doesn't have the same contents as http://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/repodata/ | 01:10 |
ianw | Packages/grubby-8.40-59.el9.x86_64.rpm also doesn't exist on the fb mirror, one of the recent updated packages | 01:12 |
ianw | it is @ http://dfw.mirror.rackspace.com/centos-stream/9-stream/BaseOS/x86_64/os/Packages/grubby-8.40-59.el9.x86_64.rpm | 01:14 |
ianw | I sent mail to mirror-admin@lists.fedoraproject.org on 14th april about rsync not working on the rax mirrors | 01:20 |
ianw | i never got a response, but i'm guessing someone fixed something | 01:21 |
ianw | i just got a 502 from gerrrit :/ | 01:26 |
ianw | logs have [2022-08-11T01:28:14.933Z] [HTTP-15318535] WARN org.eclipse.jetty.util.thread.QueuedThreadPool : QueuedThreadPool[HTTP]@35bfa7be{STARTED,20<=100<=100,i=0,r=10,q=200}[ReservedThreadExecutor@723aea0f{s=6/10,p=1}] rejected Accept@b7fa454[java.nio.channels.SocketChannel[connected local=/127.0.0.1:8081 remote=/127.0.0.1:50968]] | 01:28 |
ianw | that starts at [2022-08-11T01:26:58.960Z] | 01:30 |
ianw | at [2022-08-11T01:25:48.306Z] we had RROR com.google.gerrit.httpd.GitOverHttpServlet.GerritUploadPackErrorHandler : Internal error during upload-pack from Repository[/var/gerrit/git/openstack/tacker.git] [CONTEXT project="openstack/tacker" request="GIT_UPLOAD" ] | 01:31 |
ianw | it's over a minute later, so seems unlikley to be related | 01:31 |
ianw | nothing in dmesg | 01:32 |
ianw | 3543612 gerrit2 20 0 121.3g 106.3g 60608 S 30.5 84.5 108119:50 java | 01:33 |
ianw | nothing crazy in cpu/memory usage | 01:33 |
*** rlandy|bbl is now known as rlandy | 01:33 | |
*** rlandy is now known as rlandy|out | 01:36 | |
opendevreview | Merged opendev/system-config master: install-borg: update to borg 1.1.18 https://review.opendev.org/c/opendev/system-config/+/852488 | 01:38 |
ianw | well it seems alive again | 01:43 |
ianw | the error is an overflow of the HTTP incoming requests queue (httpd.maxQueued on your gerrit.config) | 01:46 |
ianw | which seems about right, ssh was working | 01:47 |
Clark[m] | Gerrit is rejecting new connections and Apache returns 502? | 01:47 |
ianw | in summary i'd say, yep | 01:48 |
Clark[m] | We might be able to up those limits given the larger server size. | 01:48 |
Clark[m] | But I guess we need to see if someone needs to be pointed at gitea too | 01:49 |
ianw | in the httpd gerrit logs we have | 01:50 |
*** ysandeep|out is now known as ysandeep | 01:50 | |
ianw | an entry at 2022-08-11T01:26:37.694Z then the next at 2022-08-11T01:28:18.204Z | 01:50 |
ianw | the queuedthreadpool thing started at 01:28:14 | 01:51 |
Clark[m] | Cacti points to a spike in tcp connections but other system resources don't seem to follow | 01:51 |
Clark[m] | I wonder if we have some other pause event (gc?) That causes tcp to backup | 01:52 |
Clark[m] | But if ssh was fine that doesn't seem to line up either | 01:52 |
ianw | # cat sshd_log | grep 2022-08-11T01:26 | grep sf-project-io | grep LOGIN | wc -l | 01:55 |
ianw | 398 | 01:55 |
Clark[m] | If/when it happens again dumping cache stats which includes jvm internal memory info may be helpful | 01:55 |
ianw | it seems like sf-project-io logged in 398 times in one minute | 01:56 |
Clark[m] | We may be in a gc loop or maxed out in the jvm on memory | 01:56 |
Clark[m] | Oh weird | 01:56 |
ianw | i wonder if that ran us out of tcp ... something and ended up hanging the webserver bits | 01:56 |
Clark[m] | And that would push us over the limit. We have limits on the number of connections per IP and per user but maybe they aren't sufficient here for some reason | 01:56 |
ianw | that is the most suspicious thing i can see | 01:58 |
Clark[m] | Maybe Tristan can help track down what might've caused that on their end | 01:59 |
ianw | tristanC: ^ any idea why this would have had a big loop of logins at this time? | 01:59 |
ianw | in better news storyboard & translate have functioning borg venvs now, with 1.1.8 | 02:06 |
opendevreview | Merged opendev/system-config master: letsencrypt-acme-sh-install: handle errors better in driver https://review.opendev.org/c/opendev/system-config/+/696211 | 02:47 |
*** pojadhav|out is now known as pojadhav|rover | 02:49 | |
opendevreview | Merged opendev/system-config master: letsencrypt: make acme.sh exits clearer https://review.opendev.org/c/opendev/system-config/+/850435 | 02:49 |
*** ysandeep is now known as ysandeep|afk | 02:57 | |
*** ysandeep|afk is now known as ysandeep | 03:10 | |
*** ysandeep is now known as ysandeep|away | 03:25 | |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run-borg-backup: rename hosts to distro https://review.opendev.org/c/opendev/system-config/+/852685 | 03:33 |
opendevreview | Ian Wienand proposed openstack/project-config master: infra-package-needs: blank out coreutils for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852798 | 03:38 |
opendevreview | Merged openstack/project-config master: infra-package-needs: blank out coreutils for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852798 | 03:59 |
*** ysandeep|away is now known as ysandeep | 04:13 | |
ianw | hrm, https://review.opendev.org/c/opendev/system-config/+/852799 is there, but wasn't announced ^ | 04:14 |
ianw | and i don't think gerrit has picked it up | 04:14 |
ianw | s/gerrit/zuul/ | 04:18 |
ianw | [2022-08-11T04:00:19.374Z] [SSH git-receive-pack /opendev/system-config.git (iwienand)] WARN com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker killed after 245282ms, cancelled (timeout=5282ms, task=RECEIVE_COMMITS(Processing changes)) | 04:22 |
ianw | guess what was happening at 04:00... | 04:23 |
ianw | cat sshd_log | grep 2022-08-11T04:01 | grep sf-project-io | grep LOGIN | wc -l | 04:24 |
ianw | 98 | 04:24 |
ianw | it's more spaced out, but a lot of sf-project-io logins | 04:24 |
opendevreview | Merged opendev/system-config master: system-config-run-borg-backup: add jammy test host https://review.opendev.org/c/opendev/system-config/+/852489 | 05:32 |
opendevreview | Merged opendev/system-config master: gate-groups: remove old backup group https://review.opendev.org/c/opendev/system-config/+/852684 | 05:36 |
*** marios is now known as marios|ruck | 05:38 | |
*** ysandeep is now known as ysandeep|afk | 06:04 | |
ianw | 2022-08-11 06:16:57.023 | Build completed successfully ... rocky 9 finally worked | 06:22 |
*** ysandeep|afk is now known as ysandeep | 06:45 | |
*** jpena|off is now known as jpena | 06:57 | |
*** ysandeep is now known as ysandeep|lunch | 08:21 | |
*** tosky_ is now known as tosky | 09:33 | |
*** ysandeep|lunch is now known as ysandeep | 10:12 | |
*** rlandy|out is now known as rlandy | 10:38 | |
*** ysandeep is now known as ysandeep|afk | 10:55 | |
*** dviroel|out is now known as dviroel | 11:31 | |
*** ysandeep|afk is now known as ysandeep | 11:45 | |
tristanC | ianw: i guess it's caused by a zuul's periodic trigger event. Is there something we could do about it? | 12:23 |
*** efoley_ is now known as efoley | 12:33 | |
fungi | tristanC: maybe some of the newer jitter options for the timer trigger, if your zuul is new enough? | 12:48 |
tristanC | fungi: thanks, i guess we'll need to update zuul to 6.2.0 | 13:16 |
fungi | that's just one idea, others may have different suggestions | 13:20 |
*** rcastillo is now known as rcastillo|rover | 13:57 | |
Clark[m] | tristanC: fungi: opendev's periodic jobs don't seem to cause the same issues. Maybe compare the pipeline definitions and zuul connection settings for Gerrit? | 13:58 |
*** pojadhav|rover is now known as pojadhav|afk | 14:19 | |
fungi | setuptools 64.0.0 is out, as is pbr 5.10.0 | 14:23 |
fungi | so keep an eye out for anything possibly related | 14:25 |
fungi | https://setuptools.pypa.io/en/latest/history.html#v64-0-0 | 14:25 |
fungi | the big change in setuptools is the pep 660 editable installs implementation | 14:25 |
fungi | so it could impact testing if projects have tox set to do editable by default, though the new stuff *should* only actually kick in for projects also doing pep 517 builds (via pyproject.toml configuration) | 14:27 |
fungi | "Added ability of collecting source files from custom build sub-commands to sdist. This allows plugins and customization scripts to automatically add required source files in the source distribution." | 14:28 |
fungi | that might actually come in handy for pbr | 14:28 |
JayF | oh that's awesome, I've been waiting for setuptools to get that editable implementation | 14:31 |
Clark[m] | tristanC: fungi: another thing we should rule out is sf hitting the login limit which causes immediate retries and a thundering herd. I don't have evidence of this just occurred to me that it could happen if the software retries aggressively | 14:35 |
*** pojadhav|afk is now known as pojadhav | 15:14 | |
*** marios|ruck is now known as marios|out | 15:30 | |
*** pojadhav is now known as pojadhav|out | 15:31 | |
clarkb | 'failed to open /etc/mailman/sites for linear search: No such file or directory' is the exim error for ianw's mm3 signup problem. I think that is an artifact of our old vhosting setup under mm2. I'll take a look at cleaning that up. I also notice that we have errors creating xapian indexes due to user in the upstream containers not aligning with our containers | 15:33 |
clarkb | er not aligning with our hosts. I think that the expectation is those containers do start as root then they change their process ownership to the baked in mailman user. But its uid is 100 :/ | 15:34 |
clarkb | I'm not sure what the best way to handle that is. Maybe we can bind mount an /etc/passwd that changes the uid to align with what we want? | 15:34 |
fungi | yes, we have a custom exim router on our mm2 servers to look up which mailing list chroot to use based on the domain | 15:34 |
fungi | mm3 shouldn't need that | 15:34 |
clarkb | oh to make the uid/gid situation works the mailman-web and mailman-core gids are different | 15:36 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 15:48 |
clarkb | fungi: something like ^ based on the contents of: https://github.com/maxking/docker-mailman/tree/main/core/assets/exim ? | 15:48 |
clarkb | I'm going to cycle out the held node for ^ | 15:48 |
*** ysandeep is now known as ysandeep|dinner | 15:48 | |
clarkb | old hold deleted and new one created | 15:49 |
fungi | clarkb: yeah, the simple config examples there ought to be valid for our case as well | 15:49 |
clarkb | I'm still not sure what the best way to address the user mismatch is ebtwene host and containers | 15:50 |
clarkb | we could configure docker to do the user offset thing and then set perms outside the container appropriately | 15:50 |
clarkb | We could build our own image based on the usptream iamge that changes the uid and gid and chowns everything | 15:50 |
clarkb | But one problem at a time :) | 15:51 |
clarkb | apparently making a new docker image to update /etc/passwd and /etc/group and chowning everything is a common choice here :/ | 15:54 |
clarkb | I guess if ^ works then my change may actually send some emails now. To nobody@openstack.org and test@example.com. That should be fine | 15:59 |
fungi | i thought we added firewall rules to prevent those from actually being delivered | 16:00 |
fungi | if memory serves, i did that a while back when we were increasing our test coverage for ml server deployments | 16:01 |
fungi | so that we could safely exercise the mta | 16:01 |
clarkb | if we did I'm not finding them | 16:03 |
clarkb | we set the "don't send email" flag on the lsit creation command under mm2 | 16:03 |
*** dviroel is now known as dviroel|lunch | 16:04 | |
clarkb | we run the service-lists.yaml playbook in the mm2 job and it creates the mailing lists. That test has template files for host vars for lists.o.o and lists.kc.io but neither seem to set up special iptables rules | 16:04 |
clarkb | I guess we could've done it globally on all test nodes /me looks | 16:04 |
fungi | mmm, yeah i'm trying to find where/when i remembered doing tat | 16:05 |
clarkb | aha that is where the rules live | 16:05 |
clarkb | system-config/playbooks/zuul/templates/group_vars/all.yaml.j2 | 16:05 |
fungi | aha, yep that was last year in https://review.opendev.org/820900 | 16:05 |
clarkb | looks like we allow the port 25 connection over localhost but then reject it otherwise | 16:05 |
fungi | right, that way mailman can send to exim, but exim can't send out | 16:06 |
clarkb | that explains why ianw's emails hit exim but if the config wasn't broken would've been blocked from there | 16:06 |
clarkb | ok cool | 16:06 |
fungi | i added it precisely for this case, and in preparation for the mm3 work | 16:06 |
clarkb | in that case my fixed up change should have errors in exim sending email out, but not config related errors | 16:07 |
opendevreview | Merged openstack/project-config master: End project gating for openstack-helm-addons https://review.opendev.org/c/openstack/project-config/+/851857 | 16:07 |
clarkb | fungi: I guess if/when we want to test email we'd do that in a controleld setting removing the iptables rule and then trying to sign up with one of our email addrs? | 16:08 |
clarkb | similar to when we'd like to test a list's behavior | 16:08 |
fungi | right | 16:09 |
fungi | that way it's explicitly under our control | 16:09 |
fungi | down the road, if we wanted automated testing for something like that in a job, we could add pass rules in the firewall for another job node | 16:10 |
clarkb | I guess we have to be careful that any buffered messages don't all get out when we drop the iptabltes rule too | 16:10 |
*** jpena is now known as jpena|off | 16:10 | |
fungi | yes, `exim4 -Mrm ...` should allow us to delete them | 16:11 |
fungi | `exim4 -bp` to list | 16:11 |
clarkb | cool | 16:11 |
*** ysandeep|dinner is now known as ysandeep|out | 16:18 | |
clarkb | ok job for that latest patchset completed and the node is held (104.130.172.61). There is only an exim mainlog. No rejectlog (or error log I forget what the full set is). That implies to me that maybe we didn't try to send email at all? | 16:24 |
*** gibi is now known as gibi_pto | 16:24 | |
clarkb | `exim4 -bp` returns no results fwiw | 16:25 |
clarkb | I guess the next steps are probably to continue trying to sign up for an account on the server through the web ui and see what that does as far as sending email. Then create a mailing list with our account as owner to see if we get emailed? | 16:26 |
clarkb | ok new error: sender verify fail for <postorius@lists.opendev.org>: Unrouteable address | 16:33 |
clarkb | I guess I need to update the exim config to make that valid? | 16:33 |
clarkb | something in the mailman_verp_router I expect. But I'm don't understand it well enough to know what we should change | 16:34 |
clarkb | senders = "*-bounces@*" <- does that need to be updated? | 16:35 |
clarkb | also feel free to update the change. I think you undersatnd this stuff a lot better than I do | 16:36 |
clarkb | and ya the rejectlog existing now after my error would imply to me that maybe we aren't trying to send email when creating lists or adding list owners. Still needs better testing of that behavior, but that is encouraging | 16:37 |
fungi | yeah, so exim is currently configured to verify sender addresses on receipt | 16:43 |
fungi | need to think about what the postorius address is used for and how we'll route it | 16:43 |
clarkb | fungi: that error was generated by me trying to sign up for an account on the test server. It is used to send the email verification message at least | 16:45 |
fungi | yep, just wondering what else it might get used for | 16:45 |
fungi | as soon as i get lunch cleared away i'll check the docs to see if they say | 16:46 |
clarkb | good point. The example exim config form the docker image repo shows -bounces -etc seem to align with mm2 | 16:46 |
clarkb | so I don't think it is used for list management. But it could be that is an incomplete listing | 16:46 |
fungi | looking at mm2 messages, we get notifications from mailman-owner@ for creation of new lists, from $foo-request@ when subscribing to a new list | 16:52 |
fungi | system-wide account creation doesn't really fall into the same sort of category though | 16:52 |
clarkb | the new require_files I pushed is buggy too I just realized. I gave it the in container path of the bind mount but exim is external to the containers so it needs the host side of the bind mount path | 16:54 |
fungi | web searches would work better if i didn't constantly try to add a third "o" to "postorius" | 16:54 |
fungi | i'll have to get back to this after lunch though | 16:55 |
*** dviroel|lunch is now known as dviroel | 16:55 | |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 16:57 |
clarkb | that fixes the bind mount side confusion | 16:57 |
opendevreview | Merged openstack/project-config master: Add official-openstack-repo-jobs to openstack k8s charms https://review.opendev.org/c/openstack/project-config/+/852117 | 16:59 |
clarkb | corvus: re software factory hitting review.opendev.org with a bunch of connections all at once, is there anything that tristanC should be looking at other than pipeline jitter for periodic jobs? I don't think that openstack periodic jobs have similar issues, and they run quite a number of jobs too. Is it possible that node request timing delays in opendev are naturally breaking up | 17:23 |
clarkb | those requests for git repos maybe? We merge before the node is ready but once an exceutor is assigned and nodes are ready we do a merge again for the job itself | 17:23 |
clarkb | If sf is running all of those jobs out of containers that don't have provisioning delays I wonder if we're compacting all the requests into a much shorter period of time than in opendev | 17:23 |
fungi | oh, i hadn't made that connection, but yeah that would make sense | 17:29 |
clarkb | iirc the flow is event comes in and we do a merge for each even to determine which jobs to run. Then when job is ready to run on an executor that executor does the merge again for each job. In opendev the randomness of node provisioning liekly acts as a good smoothing system for load on gerrit requests. But if your provisioning is near instant because it is effectively already | 17:31 |
clarkb | provisioned and you are just reserving a slice of it then you'll potentially generate a lot of merges in a much shorter period of time | 17:31 |
fungi | the mm3/postorius docs are rather circular | 17:38 |
fungi | postorius: "for config instructions see mm3 docs" | 17:39 |
fungi | mm3: "postorius is the management interface, see its docs for further info" | 17:39 |
fungi | https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/docs/postorius.html | 17:40 |
clarkb | The mm3 docs have been painful to work with. Their rest docs don't really show you much about the requests and hide it all behind helper functions they've written | 17:43 |
clarkb | Which is fine to also document. But the API should be more explicit imo | 17:43 |
fungi | the main thing i want to figure out is if there is any expectation that you can send control messages to the postorius address, or whether we need to just nullroute it | 17:48 |
clarkb | Based on the exim4 configs in the mm3 docker repo I expect we can nullroute it otherwise they would've called the addr out there | 17:49 |
fungi | that's my assumption too, just trying to find any indication that it could be a bad assumption | 17:49 |
fungi | looking at acl_check_rcpt in our current exim4.conf, we automatically accept messages with a null host (comments say that's local injection), and also anything for postmaster at local domains, then apply sender verification to anything else | 17:55 |
fungi | i guess messages originating from within containers may be breaking that assumption and we might need to add some additional exceptions for loopback or whatever? | 17:57 |
clarkb | is it looking at the tcp connection details or the sender in the headers? | 17:58 |
clarkb | in this case it should be a local tcp connection | 17:58 |
clarkb | the error specifically says sender <postorius@lists.opendev.org> couldn't be verified which makes me think the issue is actually at a header level? | 17:59 |
clarkb | (if it wasn't a local tcp connection then our iptables rule would've blocked it)_ | 17:59 |
fungi | well, the address it's verifying is the sender specified in the smtp protocol header, yes | 18:02 |
fungi | the exclusions before that verify rule could be based on tcp connection information though | 18:02 |
fungi | the "require verify = sender" line in acl_check_rcpt: | 18:05 |
fungi | the main exclusion before that rule seems to be "accept hosts = :" which the comments say is checking for local mail injection, but i guess that's from calling /usr/lib/sendmail or the like | 18:06 |
fungi | otherwise the host would be 127.0.0.1 or ::1 | 18:06 |
fungi | or localhost or something like that | 18:07 |
clarkb | the comment says " Accept if the source is local SMTP (i.e. not over TCP/IP)." | 18:07 |
clarkb | that would be dropping files into the correct location? This is smtp though over localhost:25 | 18:07 |
clarkb | oh and localdomains would be lists99.opendev.org not lists.opendev.org or lists.openstack.org etc | 18:08 |
clarkb | We could set exim_local_domains? But I'm not sure that is the right way to solve this. Might be better to make the address verifiabe, but I'm not sure what that requires | 18:09 |
fungi | probably adding postorius to /etc/aliases would suffice | 18:10 |
fungi | we could manually stuff it into /etc/aliases on the real lists.o.o if we wanted, assuming the test server doesn't consider it to be local | 18:12 |
clarkb | is the test server querying exim on the prod lists.opendev.org server to verify that? | 18:12 |
fungi | in theory, yes, unless it thinks lists.opendev.org is a local domain | 18:13 |
clarkb | hrm we add lists.opendev.org to mm_domains which is added to exim_local_domains | 18:15 |
clarkb | oh but we only allow postmaster at the local domain | 18:16 |
clarkb | so ya I think it would be sufficient to add it to /etc/aliases on the test server | 18:16 |
*** tosky_ is now known as tosky | 18:16 | |
fungi | yeah, domainlist localdomains in the exim4.conf has it | 18:17 |
fungi | er, local_domains | 18:17 |
clarkb | fungi: https://review.opendev.org/c/opendev/system-config/+/851248/56/playbooks/zuul/files/host_vars/lists99.opendev.org.yaml line 42 or so we add an entry? I'm not sure what we should alias it to though | 18:18 |
fungi | we could alias it to one of the magic addresses like :fail: | 18:19 |
clarkb | fungi: something like: ' postorius: :fail: Outgoing email only from this address' ? | 18:20 |
fungi | it's worth a try. you could add it locally on the server and test | 18:20 |
clarkb | ok let me try that | 18:20 |
fungi | just stick that line in /etc/aliases | 18:20 |
clarkb | heh "a user is already registered with this address" Not a godo failure method if the server 500s but creates the records anyway. /me tries another address | 18:22 |
fungi | could probably also delete the broken account | 18:22 |
clarkb | sender verify fail for <postorius@lists.opendev.org>: outgoing email only from this address | 18:23 |
fungi | though also, maybe password recovery process works with the account | 18:23 |
clarkb | that means /etc/alias did modify the behavior but :fail: isn't what we want | 18:23 |
fungi | okay, so 1. we know it's checking the local delivery, and 2. we can't :fail: it | 18:23 |
fungi | :blackhole: maybe? | 18:24 |
fungi | https://www.exim.org/exim-html-3.20/doc/html/spec_23.html#SEC634 | 18:24 |
corvus | clarkbtristanC jitter would be the main thing. https://review.opendev.org/848516 could potentially help in some cases (less likely to help with periodic jobs, but otherwise very good for a 3pci system to cut down on its impact). | 18:24 |
clarkb | fungi: that worked. It claims email was sent to me | 18:25 |
fungi | interesting! | 18:26 |
corvus | blackhole is ~= delivering to /dev/null | 18:26 |
corvus | successfully | 18:26 |
fungi | yeah, mainly just confirming that's effective for bypassing sender verification | 18:26 |
clarkb | corvus: ya in this case mm3 uses postorius@listdomain to send email verification emails to people doing account signups | 18:26 |
clarkb | corvus: those emails were getting rejected on sender verification by exim. Will it blackhole the email even if it is the sender? or just if it is the recipient? | 18:27 |
corvus | just rcpt | 18:27 |
clarkb | I didn't get the email but didn't expect to due to our iptables rules so hard to say why it wasn't delivered. | 18:27 |
clarkb | Ah ok so thsi is probably workable (for now anyway) | 18:27 |
clarkb | as an alternative I could alias it to mailman which is a local user | 18:28 |
fungi | a cleaner option might be to configure exim to skip checking the postorius sender address and then add an explicit :fail: message for it for cases where users try to send something to or reply to that address | 18:28 |
fungi | rather than accepting messages for it and throwing them away | 18:28 |
fungi | but this is sufficient for now | 18:28 |
clarkb | I can add a TODO to the yaml config to improve the situation. Will push a patch up shortly | 18:28 |
clarkb | in the exim mainlog I see it failing to connect to send that email indicating our iptabltes rules are working | 18:29 |
fungi | perfect. broken as designed! ;) | 18:29 |
clarkb | and there are definitely no logs like that for nobody@openstack.org or test@example.com implying that creating lists and adding owners isn't generating emails to them | 18:30 |
clarkb | But I want to do mroe testing of that before we roll with it | 18:30 |
fungi | yeah, for the test list creation we pass a flag to tell it not to send a notification | 18:31 |
fungi | or at least we did at one point | 18:31 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 18:32 |
clarkb | fungi: thats a mm2 only thing as far as I can tell | 18:32 |
fungi | oh, got it | 18:32 |
clarkb | the mm3 rest api doesnt appear to have toggles for that sort of thing | 18:32 |
clarkb | I'm going to update the hold to catch that latest update | 18:33 |
clarkb | it might also be some sort of situation where if an account matching that email address exists and the email addr is verified then you'll get emailed | 18:34 |
clarkb | But when boot strapping lists like this with no accounts it won't send email to unverified addresses | 18:35 |
clarkb | If that is the case then the next thing we need to sort out is whether or not adding an account for that email address later will properly associate the ownership to the list | 18:35 |
clarkb | I think we can test this by adding creating a user to match one of those email addresses, then using django admin interface to manually verify the meail then see if we are a list admin | 18:36 |
tristanC | corvus: clarkb: ianw: ok thank you, we'll update zuul and setup the jitter settings to reduce the number of connections opened by sf-project-io | 18:38 |
clarkb | fungi: I've got to do lunch and drop the kids off after, but is there a way to read the unsent email off disk from exim? If so I should be able to follow the link that postorius sends out to verify my account that way without the email actually getting deliveryed | 18:55 |
corvus | clarkb: /var/spool/exim | 18:57 |
clarkb | corvus: thanks! | 18:58 |
corvus | there's a pair of files for each msg in the queue (suffixed with -D and -H) | 18:58 |
fungi | yep, that exactly | 18:58 |
corvus | filename is queue id which shows up in the logs | 18:58 |
fungi | also the same queue id which shows up in `exim4 -bp` or `mailq` output | 18:59 |
fungi | in a parallel directory you'll also find an ephemeral delivery log for that queue item | 19:00 |
clarkb | great that should allow me to check account verification without worrying about spam filters and iptables. | 19:01 |
clarkb | And then after that I think I'll try creating another list and make myself an owner and see if it tries to send email to me | 19:02 |
clarkb | and maybe also create another list for an uncreated user and see if it tries to send email to them | 19:02 |
fungi | yeah, i agree, not sending to addresses which don't have an account may be a safety measure in order to avoid becoming as much of a potential spam source | 19:03 |
fungi | i think bulk account creation is part of the list subscriber migration | 19:03 |
fungi | idea being people will get new mm3/postorius accounts the first time a list they're subbed to is imported, and if they want to use it they can do a password reset dance through the webui | 19:04 |
fungi | we would presumably do the same for list owners/moderators | 19:06 |
clarkb | it definitely lets me add an owner email address when that email addr doesn't have an associated account yet | 19:06 |
clarkb | that is what the current change does, it sets owner to test@example.com | 19:06 |
fungi | hopefully once the account for that address is created the user will be able to manage the list | 19:06 |
clarkb | yup exactly | 19:07 |
*** dviroel is now known as dviroel|afk | 19:07 | |
fungi | and unlike mm2, no more shared passwords for owner/mod access, it's just associated with your account (so you have a login to postorius which gets you access to manage all the lists you're associated with, as well as your subscription settings for any to which you're subscribed) | 19:09 |
clarkb | 158.69.70.114 is the new host fwiw | 19:14 |
opendevreview | Davlet Panech proposed openstack/project-config master: Add starlingx/jenkins-pipelines repo https://review.opendev.org/c/openstack/project-config/+/852919 | 19:47 |
clarkb | I have successfully created an account on that server by grabbing the confirmation url out of /var/spool/exim4/input | 20:21 |
clarkb | I just tested that a single account is valid across multiple vhosts. Though I had to log in again | 20:22 |
clarkb | (the cookies are domain scoped I guess so that makes sense) | 20:23 |
fungi | perfect. and yeah that's something i'd tested in the original poc as well but good to see the new orchestration and container bits don't change the behavior | 20:23 |
fungi | does creating a new list with your account as the owner generate a notification? | 20:25 |
clarkb | I haven't managed that yet. Need to figure out how to do it from curl | 20:25 |
clarkb | but that is next | 20:25 |
fungi | the cli tools don't work any longer? | 20:28 |
fungi | i thought i'd used those in an earlier 3.x | 20:29 |
clarkb | I'm sure tehy do but I hate them | 20:33 |
clarkb | mostly because they thin that documenting the rest api means "here's some python to run" | 20:33 |
clarkb | and they aren't even commands its, fire up and interpreter and import some stuff and call a function | 20:33 |
clarkb | I find it extremely clunky and far prefer using something like curl for things like this | 20:34 |
clarkb | I should be able to understand your api without firing up a python interpreter | 20:34 |
fungi | oh, i meant the actual executable scripts like `newlist` | 20:35 |
clarkb | fungi: I didn't realize those existed all the docs examples have you import something.somethingelse.cli | 20:37 |
clarkb | then you call functions out of the cli that way | 20:37 |
fungi | well, docker-compose exec doesn't seem to be viable with these anyway | 20:37 |
clarkb | it isn't interactively since that bug | 20:38 |
clarkb | it did work last week but then runc broke us | 20:39 |
clarkb | anyway list has been created and I've been added as an owenr | 20:39 |
fungi | oh, i can actually run some things, looks like | 20:39 |
fungi | no, nevermind, i faked myself out | 20:39 |
fungi | OCI runtime exec failed: exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown | 20:39 |
clarkb | It did not send me an email | 20:39 |
clarkb | yes thats the runc bug | 20:39 |
clarkb | https://github.com/opencontainers/runc/issues/3551 | 20:40 |
fungi | ick | 20:40 |
clarkb | anyway it works if you drp -t and just do random commands but that won't work with the mailman3 examples of running an interpreter interactively | 20:41 |
clarkb | its fine curl works great :) | 20:41 |
clarkb | but ya no emails generated from that. In the web ui I see different options for the list I own now too so that bit seemed to work | 20:41 |
clarkb | Looks like it doesn't auto sub you to a list when you are owner. But I think that is fine | 20:41 |
fungi | pretty sure mm2 never did either | 20:42 |
fungi | subscribe the owner i mean | 20:42 |
clarkb | now the thing to do is for me to create a user for test@example.com and see if that new user is associated with ownership on the existing list | 20:42 |
clarkb | heh ok that didn't work because test@example.com has no smtp service so exim completed that delivery properly without spooling things? | 20:45 |
clarkb | I'll create a third list with a third email addr as owner and then sign up for that | 20:45 |
clarkb | yup when I make a user and verify the email and login I'm an owner for the list that was precreated with my email as owner | 20:49 |
fungi | $ host -t mx example.com | 20:49 |
fungi | example.com mail is handled by 0 . | 20:49 |
fungi | that may result in some strange mailrouting | 20:49 |
clarkb | fungi: exim just says "I'm done don't need to do anything" | 20:49 |
clarkb | anyway I think this confirms a few important details of mailman3 behavior for us | 20:49 |
clarkb | first is that creating a list and adding an owner to it (at least via the rest api) does not spam the owner. Second if we set the owner before an account exists for that email addr it will auto associate the ownership with that email once the account is created and email is verified | 20:50 |
clarkb | that means we should be able to populate the existing change with all of our lists and set the owners properly in testing now | 20:50 |
fungi | thinking about it a bit more, the notifications are probably no longer necessary anyway, because there's no precreated admin password to send them | 20:51 |
fungi | so this is just fine and dandy | 20:51 |
clarkb | fungi: any objections to me updating the change with all of our current lists and setting the owner to the actual owners given ^? | 20:53 |
fungi | no objection on my part | 20:53 |
fungi | especially since outbound smtp will be blocked from the test node initially anyway, just in case | 20:53 |
clarkb | ++ | 20:54 |
clarkb | fungi: in mm2 there are mailman@domain lists. Any idea if we still need those? | 20:59 |
clarkb | I'm not sure I ever really understood the functionality of those lists | 21:00 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Add #openstack-latinamerica to accessbot https://review.opendev.org/c/openstack/project-config/+/852922 | 21:06 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add IRC logging for #openstack-latinamerica https://review.opendev.org/c/opendev/system-config/+/852923 | 21:06 |
fungi | clarkb: those lists are used primarily for the monthly password reminders we disable, but also for things like owner e-mail i think? anyway pretty sure mm3 does not need them | 21:08 |
clarkb | cool I'll go ahead and remove them | 21:11 |
fungi | also when we're getting close to migrating any list domains, we should take the opportunity to check whether there are any more we can/should retire in order to avoid migrating more than necessary | 21:13 |
clarkb | I was wondering about that recently. What is our plan for archives of old lists? DO we need to end up creating them just to give them somewhere to live? | 21:15 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 21:17 |
fungi | nope, we can just copy the old pipermail archive tree straight over. for both retired and active lists. that keeps the old urls for the archives working | 21:17 |
clarkb | I'm going to swap out the holds for ^ | 21:17 |
fungi | for active lists we'd also import the archives, but keep the old pipermail copies served as well since the url patterns differ and people have linked to them all over | 21:18 |
clarkb | but that adds all the lists to the testing node, removes the password attribute for lists and renames admin to owner to align with the api better | 21:18 |
clarkb | http://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fac/851248/58/check/system-config-run-lists3/fac28b8/bridge.openstack.org/screenshots/mm3-openstack-main.png that looks good. I checked the exim spool and logs to be extra sure no emails were attempted and it looks good | 21:48 |
fungi | awesome! | 21:49 |
clarkb | I'm going to take a break, but when I get back I'm going to put mailman down for now. The next thing is going to be fussing with users and file perms and all that to see if I can make xapian happy in a reasonable way | 21:49 |
clarkb | and that I expect to be quite consuming | 21:49 |
fungi | having dealt with xapian in a desktop setting years hence, i don't believe it will ever truly be happy | 21:50 |
clarkb | heh. In this case it is beacuse the mailman containers run as uid 100 but I've set things up to have uid something elseo n the host side. I don't really want to use 100 on the host side because thats _apt or some such | 21:51 |
clarkb | I think what we need to do is find a way to change the uid on the container side (bind mounting in /etc/passwd or doing our own images that update the upstream ones) | 21:51 |
clarkb | none of the options here are particularly good. I'm a bit suprised that the upstream images use such a low uid considering it is almost always going to collide with something on the host system | 21:52 |
clarkb | anyway 23.253.108.60 is the most recently held node if you want to look at it | 21:52 |
fungi | maybe that's a kubernetes assumption showing through | 21:52 |
clarkb | they use docker-compose actually | 21:52 |
fungi | oh, interesting | 21:52 |
fungi | yeah, very odd choice | 21:52 |
clarkb | fungi: re 23.253.108.60 one of the other things to test on the list is list behavior. Can we send email to subscribers, what about dmarc, what about private lists and so on | 21:53 |
clarkb | not sure if you wanted to poke at that but the node is there and ready for it if you have time | 21:53 |
fungi | yep, i should be able to give that stuff a shot and make some notes/observations, though probably tomorrow morning at this point | 21:58 |
clarkb | in theory we should be able to patch in a lot of those settings if we want to make them consistent. I think mailman also has templates and styles (not sure how they differ) that we might be able to setup then apply to lists as necessary | 22:01 |
clarkb | considering everything is already automated we probably don't need to rely on those features much and can directly configure what we want | 22:02 |
clarkb | heh even the register has picked up on the ssh sha1 problems https://www.theregister.com/2022/08/11/red_hat_ssh/ | 22:03 |
*** rlandy is now known as rlandy|bbl | 22:12 | |
opendevreview | Merged openstack/project-config master: Add starlingx/jenkins-pipelines repo https://review.opendev.org/c/openstack/project-config/+/852919 | 22:21 |
clarkb | https://github.com/maxking/docker-mailman/blob/main/web/docker-entrypoint.sh#L148-L150 adds more mystery to the user problems. I would've expected that to address it | 22:23 |
clarkb | I think maybe they removed the bit that allowed uid and gid to be configurable based on that comment. But the chown should've made things work for xapian | 22:24 |
clarkb | I wonder if we can just get away with precreating the dir that xapian wants so that privileged chown will apply to it. I suspect the problem is that xapian is trying to create the dir after it has dropped privs | 22:26 |
opendevreview | Merged openstack/project-config master: Add #openstack-latinamerica to accessbot https://review.opendev.org/c/openstack/project-config/+/852922 | 22:33 |
fungi | oh, maybe | 23:28 |
opendevreview | Ian Wienand proposed openstack/project-config master: linter: update ansible-lint; add auto-download of roles https://review.opendev.org/c/openstack/project-config/+/851278 | 23:29 |
ianw | gosh xapian is something i haven't heard in a long time | 23:36 |
clarkb | ianw: it is the recommended hyperkitty indexer | 23:38 |
clarkb | the default is something else but apparently the default will change in the next release or something and be xapian | 23:38 |
ianw | last time i used it was with a fairly popular moinmoin wiki for Itanium linux development with Gelato@UNSW | 23:39 |
ianw | linux and UNSW still exist, none of the other bits do :) | 23:39 |
fungi | i suppose it's no worse than mediawiki's search plugin using logstash | 23:39 |
fungi | er, elasticsearch i mean | 23:40 |
fungi | i bet xapian is still using an open source license at least ;) | 23:40 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 23:41 |
ianw | "Change 852922 in project openstack/project-config does not share a change queue with 852923 in project opendev/system-config" | 23:41 |
ianw | on https://review.opendev.org/c/opendev/system-config/+/852923 ... that's a new one? | 23:42 |
clarkb | fungi: ^ I put a hold on that latest patchset but didn't delete the older one as I'm nto sure I won't break things. I think if this newer one deploys you can use it to debug email stuff just fine. Otherwise fallback to the old one and I'll clean up once we know how things look | 23:42 |
clarkb | ianw: the reporting is new but the error is not. | 23:42 |
clarkb | ianw: it just needs a recheck or reapproval once the depend on has merged | 23:43 |
ianw | clarkb: are you ok with https://review.opendev.org/c/opendev/system-config/+/852793 for now? | 23:43 |
clarkb | ianw: ya I tried to make it clear my -1 was more about getting tripleo to think about their needs instead of blindly choosing something. But if the old upstream mirror is up to date switching to it now should be fine | 23:43 |
clarkb | ianw: I was worried that if I +2'd it would've gotten lost | 23:43 |
clarkb | I also responded to the openstack-discuss thread on that pointing out that we do make changes to our mirror backend stuff and don't really consider it to be a public interface | 23:44 |
clarkb | things like converting pypi from bandersnatch to caching proxy and removing all source packages | 23:44 |
ianw | yeah, i agree we want to do a bit of research before just switching blindly | 23:44 |
ianw | in this case i think we have; i've tried to reach out at least | 23:45 |
fungi | ianw: as for the verified -2, it's a side effect of starting to report the approval of a dependent in a non-shared queue instead of completely ignoring the approval event, with unanticipated fallout for the openstack tenant's "clean check" rule | 23:50 |
opendevreview | Merged openstack/project-config master: linter: update ansible-lint; add auto-download of roles https://review.opendev.org/c/openstack/project-config/+/851278 | 23:56 |
opendevreview | Merged opendev/system-config master: system-config-run-borg-backup: rename hosts to distro https://review.opendev.org/c/opendev/system-config/+/852685 | 23:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!