Thursday, 2022-08-11

ianwrocky 9 is still in a failure loop00:08
ianwits still looking for ntpdate ntp ntp-perl00:09
ianweither the project-config chagne isn't rolled out, or something else is going on00:09
ianwi think its the former00:10
clarkbdid the change to how we deploy project-config land? that was one of my concerns with it that somehow we'd break pushing project-cpnfig out00:11
ianwoh, heh, it would help if the change was actually merged00:11
ianwhttps://review.opendev.org/c/openstack/project-config/+/85251800:11
clarkboh heh I can review that00:11
clarkbdoesn't https://review.opendev.org/c/openstack/project-config/+/852518/2/nodepool/elements/infra-package-needs/install.d/10-packages change from not matching 9-stream to matching 9-stream?00:12
clarkboh hrm that is what the ps1 comments are about00:13
ianwactually i think you're right00:14
ianwthat should be a !00:14
clarkbya I just tested it locally and 9-stream =~ '9' evaluates to true under [[00:15
ianwand i need to turn it around in the other file too...00:15
opendevreviewIan Wienand proposed openstack/project-config master: nodepool: update package maps for Rocky 9  https://review.opendev.org/c/openstack/project-config/+/85251800:17
fungioh whoops00:18
clarkbianw: I think that update forgot the ! on the second file?00:19
ianwfor that one we want to match 9 and exit, right?00:19
clarkboh wait no its exit 0 in that condition to skip00:19
opendevreviewMerged opendev/system-config master: system-config-run: bump base timeout to 3600  https://review.opendev.org/c/opendev/system-config/+/85247900:19
opendevreviewMerged opendev/system-config master: Also pin pip/setuptools when creating Xenial venvs  https://review.opendev.org/c/opendev/system-config/+/85278600:19
clarkbya sorry we want similar behavior but acheve it by doing the inverse in the block so the condition should be inverted00:19
fungiand yay things finally merged00:21
ianwcool.  i'll remove the problem venvs, and merge the borg 1.1.18 update, and watch that it deploys 00:23
ianwit's really only storyboard01 and translate0100:27
ianwlists.openstack.org is totally non-puppeted now, isn't it?00:27
clarkbianw: correct00:28
fungiianw: cacti is our only other bionic server00:31
fungiand i guess we don't back it up00:31
ianws/bionic/xenial/ right?00:31
fungiyes, sorry, i meant xenial00:31
ianwi guess that is on the chopping block for Prometheus 00:33
fungiright00:33
fungistoryboard we still need to add ansible to deploy the containers we build00:33
fungiand zanata is... well... zanata just is00:34
opendevreviewMerged openstack/project-config master: nodepool: update package maps for Rocky 9  https://review.opendev.org/c/openstack/project-config/+/85251800:40
ianwit does seem like http://mirror.facebook.net/centos-stream/9-stream/BaseOS/x86_64/os/repodata/ doesn't have the same contents as http://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/repodata/01:10
ianwPackages/grubby-8.40-59.el9.x86_64.rpm also doesn't exist on the fb mirror, one of the recent updated packages01:12
ianwit is @ http://dfw.mirror.rackspace.com/centos-stream/9-stream/BaseOS/x86_64/os/Packages/grubby-8.40-59.el9.x86_64.rpm01:14
ianwI sent mail to mirror-admin@lists.fedoraproject.org on 14th april about rsync not working on the rax mirrors01:20
ianwi never got a response, but i'm guessing someone fixed something01:21
ianwi just got a 502 from gerrrit :/01:26
ianwlogs have [2022-08-11T01:28:14.933Z] [HTTP-15318535] WARN  org.eclipse.jetty.util.thread.QueuedThreadPool : QueuedThreadPool[HTTP]@35bfa7be{STARTED,20<=100<=100,i=0,r=10,q=200}[ReservedThreadExecutor@723aea0f{s=6/10,p=1}] rejected Accept@b7fa454[java.nio.channels.SocketChannel[connected local=/127.0.0.1:8081 remote=/127.0.0.1:50968]]01:28
ianwthat starts at [2022-08-11T01:26:58.960Z]01:30
ianwat [2022-08-11T01:25:48.306Z] we had RROR com.google.gerrit.httpd.GitOverHttpServlet.GerritUploadPackErrorHandler : Internal error during upload-pack from Repository[/var/gerrit/git/openstack/tacker.git] [CONTEXT project="openstack/tacker" request="GIT_UPLOAD" ]01:31
ianwit's over a minute later, so seems unlikley to be related01:31
ianwnothing in dmesg01:32
ianw3543612 gerrit2   20   0  121.3g 106.3g  60608 S  30.5  84.5 108119:50 java01:33
ianwnothing crazy in cpu/memory usage01:33
*** rlandy|bbl is now known as rlandy01:33
*** rlandy is now known as rlandy|out01:36
opendevreviewMerged opendev/system-config master: install-borg: update to borg 1.1.18  https://review.opendev.org/c/opendev/system-config/+/85248801:38
ianwwell it seems alive again01:43
ianwthe error is an overflow of the HTTP incoming requests queue (httpd.maxQueued on your gerrit.config)01:46
ianwwhich seems about right, ssh was working01:47
Clark[m]Gerrit is rejecting new connections and Apache returns 502?01:47
ianwin summary i'd say, yep01:48
Clark[m]We might be able to up those limits given the larger server size.01:48
Clark[m]But I guess we need to see if someone needs to be pointed at gitea too01:49
ianwin the httpd gerrit logs we have01:50
*** ysandeep|out is now known as ysandeep01:50
ianwan entry at 2022-08-11T01:26:37.694Z then the next at 2022-08-11T01:28:18.204Z01:50
ianwthe queuedthreadpool thing started at 01:28:1401:51
Clark[m]Cacti points to a spike in tcp connections but other system resources don't seem to follow01:51
Clark[m]I wonder if we have some other pause event (gc?) That causes tcp to backup01:52
Clark[m]But if ssh was fine that doesn't seem to line up either01:52
ianw# cat sshd_log | grep 2022-08-11T01:26 | grep sf-project-io | grep LOGIN | wc -l01:55
ianw39801:55
Clark[m]If/when it happens again dumping cache stats which includes jvm internal memory info may be helpful01:55
ianwit seems like sf-project-io logged in 398 times in one minute01:56
Clark[m]We may be in a gc loop or maxed out in the jvm on memory01:56
Clark[m]Oh weird01:56
ianwi wonder if that ran us out of tcp ... something and ended up hanging the webserver bits01:56
Clark[m]And that would push us over the limit. We have limits on the number of connections per IP and per user but maybe they aren't sufficient here for some reason 01:56
ianwthat is the most suspicious thing i can see01:58
Clark[m]Maybe Tristan can help track down what might've caused that on their end01:59
ianwtristanC: ^ any idea why this would have had a big loop of logins at this time?01:59
ianwin better news storyboard & translate have functioning borg venvs now, with 1.1.802:06
opendevreviewMerged opendev/system-config master: letsencrypt-acme-sh-install: handle errors better in driver  https://review.opendev.org/c/opendev/system-config/+/69621102:47
*** pojadhav|out is now known as pojadhav|rover02:49
opendevreviewMerged opendev/system-config master: letsencrypt: make acme.sh exits clearer  https://review.opendev.org/c/opendev/system-config/+/85043502:49
*** ysandeep is now known as ysandeep|afk02:57
*** ysandeep|afk is now known as ysandeep03:10
*** ysandeep is now known as ysandeep|away03:25
opendevreviewIan Wienand proposed opendev/system-config master: system-config-run-borg-backup: rename hosts to distro  https://review.opendev.org/c/opendev/system-config/+/85268503:33
opendevreviewIan Wienand proposed openstack/project-config master: infra-package-needs: blank out coreutils for Rocky 9  https://review.opendev.org/c/openstack/project-config/+/85279803:38
opendevreviewMerged openstack/project-config master: infra-package-needs: blank out coreutils for Rocky 9  https://review.opendev.org/c/openstack/project-config/+/85279803:59
*** ysandeep|away is now known as ysandeep04:13
ianwhrm, https://review.opendev.org/c/opendev/system-config/+/852799 is there, but wasn't announced ^04:14
ianwand i don't think gerrit has picked it up04:14
ianws/gerrit/zuul/04:18
ianw[2022-08-11T04:00:19.374Z] [SSH git-receive-pack /opendev/system-config.git (iwienand)] WARN  com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker killed after 245282ms, cancelled (timeout=5282ms, task=RECEIVE_COMMITS(Processing changes))04:22
ianwguess what was happening at 04:00...04:23
ianwcat sshd_log | grep 2022-08-11T04:01 | grep sf-project-io | grep LOGIN | wc -l04:24
ianw9804:24
ianwit's more spaced out, but a lot of sf-project-io logins04:24
opendevreviewMerged opendev/system-config master: system-config-run-borg-backup: add jammy test host  https://review.opendev.org/c/opendev/system-config/+/85248905:32
opendevreviewMerged opendev/system-config master: gate-groups: remove old backup group  https://review.opendev.org/c/opendev/system-config/+/85268405:36
*** marios is now known as marios|ruck05:38
*** ysandeep is now known as ysandeep|afk06:04
ianw2022-08-11 06:16:57.023 | Build completed successfully ... rocky 9 finally worked06:22
*** ysandeep|afk is now known as ysandeep06:45
*** jpena|off is now known as jpena06:57
*** ysandeep is now known as ysandeep|lunch08:21
*** tosky_ is now known as tosky09:33
*** ysandeep|lunch is now known as ysandeep10:12
*** rlandy|out is now known as rlandy10:38
*** ysandeep is now known as ysandeep|afk10:55
*** dviroel|out is now known as dviroel11:31
*** ysandeep|afk is now known as ysandeep11:45
tristanCianw: i guess it's caused by a zuul's periodic trigger event. Is there something we could do about it?12:23
*** efoley_ is now known as efoley12:33
fungitristanC: maybe some of the newer jitter options for the timer trigger, if your zuul is new enough?12:48
tristanCfungi: thanks, i guess we'll need to update zuul to 6.2.013:16
fungithat's just one idea, others may have different suggestions13:20
*** rcastillo is now known as rcastillo|rover13:57
Clark[m]tristanC: fungi: opendev's periodic jobs don't seem to cause the same issues. Maybe compare the pipeline definitions and zuul connection settings for Gerrit?13:58
*** pojadhav|rover is now known as pojadhav|afk14:19
fungisetuptools 64.0.0 is out, as is pbr 5.10.014:23
fungiso keep an eye out for anything possibly related14:25
fungihttps://setuptools.pypa.io/en/latest/history.html#v64-0-014:25
fungithe big change in setuptools is the pep 660 editable installs implementation14:25
fungiso it could impact testing if projects have tox set to do editable by default, though the new stuff *should* only actually kick in for projects also doing pep 517 builds (via pyproject.toml configuration)14:27
fungi"Added ability of collecting source files from custom build sub-commands to sdist. This allows plugins and customization scripts to automatically add required source files in the source distribution."14:28
fungithat might actually come in handy for pbr14:28
JayFoh that's awesome, I've been waiting for setuptools to get that editable implementation14:31
Clark[m]tristanC: fungi: another thing we should rule out is sf hitting the login limit which causes immediate retries and a thundering herd. I don't have evidence of this just occurred to me that it could happen if the software retries aggressively 14:35
*** pojadhav|afk is now known as pojadhav15:14
*** marios|ruck is now known as marios|out15:30
*** pojadhav is now known as pojadhav|out15:31
clarkb'failed to open /etc/mailman/sites for linear search: No such file or directory' is the exim error for ianw's mm3 signup problem. I think that is an artifact of our old vhosting setup under mm2. I'll take a look at cleaning that up. I also notice that we have errors creating xapian indexes due to user in the upstream containers not aligning with our containers15:33
clarkber not aligning with our hosts. I think that the expectation is those containers do start as root then they change their process ownership to the baked in mailman user. But its uid is 100 :/15:34
clarkbI'm not sure what the best way to handle that is. Maybe we can bind mount an /etc/passwd that changes the uid to align with what we want?15:34
fungiyes, we have a custom exim router on our mm2 servers to look up which mailing list chroot to use based on the domain15:34
fungimm3 shouldn't need that15:34
clarkboh to make the uid/gid situation works the mailman-web and mailman-core gids are different15:36
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124815:48
clarkbfungi: something like ^ based on the contents of: https://github.com/maxking/docker-mailman/tree/main/core/assets/exim ?15:48
clarkbI'm going to cycle out the held node for ^15:48
*** ysandeep is now known as ysandeep|dinner15:48
clarkbold hold deleted and new one created15:49
fungiclarkb: yeah, the simple config examples there ought to be valid for our case as well15:49
clarkbI'm still not sure what the best way to address the user mismatch is ebtwene host and containers15:50
clarkbwe could configure docker to do the user offset thing and then set perms outside the container appropriately15:50
clarkbWe could build our own image based on the usptream iamge that changes the uid and gid and chowns everything15:50
clarkbBut one problem at a time :)15:51
clarkbapparently making a new docker image to update /etc/passwd and /etc/group and chowning everything is a common choice here :/15:54
clarkbI guess if ^ works then my change may actually send some emails now. To nobody@openstack.org and test@example.com. That should be fine15:59
fungii thought we added firewall rules to prevent those from actually being delivered16:00
fungiif memory serves, i did that a while back when we were increasing our test coverage for ml server deployments16:01
fungiso that we could safely exercise the mta16:01
clarkbif we did I'm not finding them16:03
clarkbwe set the "don't send email" flag on the lsit creation command under mm216:03
*** dviroel is now known as dviroel|lunch16:04
clarkbwe run the service-lists.yaml playbook in the mm2 job and it creates the mailing lists. That test has template files for host vars for lists.o.o and lists.kc.io but neither seem to set up special iptables rules16:04
clarkbI guess we could've done it globally on all test nodes /me looks16:04
fungimmm, yeah i'm trying to find where/when i remembered doing tat16:05
clarkbaha that is where the rules live16:05
clarkbsystem-config/playbooks/zuul/templates/group_vars/all.yaml.j216:05
fungiaha, yep that was last year in https://review.opendev.org/82090016:05
clarkblooks like we allow the port 25 connection over localhost but then reject it otherwise16:05
fungiright, that way mailman can send to exim, but exim can't send out16:06
clarkbthat explains why ianw's emails hit exim but if the config wasn't broken would've been blocked from there16:06
clarkbok cool16:06
fungii added it precisely for this case, and in preparation for the mm3 work16:06
clarkbin that case my fixed up change should have errors in exim sending email out, but not config related errors16:07
opendevreviewMerged openstack/project-config master: End project gating for openstack-helm-addons  https://review.opendev.org/c/openstack/project-config/+/85185716:07
clarkbfungi: I guess if/when we want to test email we'd do that in a controleld setting removing the iptables rule and then trying to sign up with one of our email addrs?16:08
clarkbsimilar to when we'd like to test a list's behavior16:08
fungiright16:09
fungithat way it's explicitly under our control16:09
fungidown the road, if we wanted automated testing for something like that in a job, we could add pass rules in the firewall for another job node16:10
clarkbI guess we have to be careful that any buffered messages don't all get out when we drop the iptabltes rule too16:10
*** jpena is now known as jpena|off16:10
fungiyes, `exim4 -Mrm ...` should allow us to delete them16:11
fungi`exim4 -bp` to list16:11
clarkbcool16:11
*** ysandeep|dinner is now known as ysandeep|out16:18
clarkbok job for that latest patchset completed and the node is held (104.130.172.61). There is only an exim mainlog. No rejectlog (or error log I forget what the full set is). That implies to me that maybe we didn't try to send email at all?16:24
*** gibi is now known as gibi_pto16:24
clarkb`exim4 -bp` returns no results fwiw16:25
clarkbI guess the next steps are probably to continue trying to sign up for an account on the server through the web ui and see what that does as far as sending email. Then create a mailing list with our account as owner to see if we get emailed?16:26
clarkbok new error: sender verify fail for <postorius@lists.opendev.org>: Unrouteable address16:33
clarkbI guess I need to update the exim config to make that valid?16:33
clarkbsomething in the mailman_verp_router I expect. But I'm don't understand it well enough to know what we should change16:34
clarkbsenders = "*-bounces@*" <- does that need to be updated?16:35
clarkbalso feel free to update the change. I think you undersatnd this stuff a lot better than I do16:36
clarkband ya the rejectlog existing now after my error would imply to me that maybe we aren't trying to send email when creating lists or adding list owners. Still needs better testing of that behavior, but that is encouraging16:37
fungiyeah, so exim is currently configured to verify sender addresses on receipt16:43
fungineed to think about what the postorius address is used for and how we'll route it16:43
clarkbfungi: that error was generated by me trying to sign up for an account on the test server. It is used to send the email verification message at least16:45
fungiyep, just wondering what else it might get used for16:45
fungias soon as i get lunch cleared away i'll check the docs to see if they say16:46
clarkbgood point. The example exim config form the docker image repo shows -bounces -etc seem to align with mm216:46
clarkbso I don't think it is used for list management. But it could be that is an incomplete listing16:46
fungilooking at mm2 messages, we get notifications from mailman-owner@ for creation of new lists, from $foo-request@ when subscribing to a new list16:52
fungisystem-wide account creation doesn't really fall into the same sort of category though16:52
clarkbthe new require_files I pushed is buggy too I just realized. I gave it the in container path of the bind mount but exim is external to the containers so it needs the host side of the bind mount path16:54
fungiweb searches would work better if i didn't constantly try to add a third "o" to "postorius"16:54
fungii'll have to get back to this after lunch though16:55
*** dviroel|lunch is now known as dviroel16:55
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124816:57
clarkbthat fixes the bind mount side confusion16:57
opendevreviewMerged openstack/project-config master: Add official-openstack-repo-jobs to openstack k8s charms  https://review.opendev.org/c/openstack/project-config/+/85211716:59
clarkbcorvus: re software factory hitting review.opendev.org with a bunch of connections all at once, is there anything that tristanC should be looking at other than pipeline jitter for periodic jobs? I don't think that openstack periodic jobs have similar issues, and they run quite a number of jobs too. Is it possible that node request timing delays in opendev are naturally breaking up17:23
clarkbthose requests for git repos maybe? We merge before the node is ready but once an exceutor is assigned and nodes are ready we do a merge again for the job itself17:23
clarkbIf sf is running all of those jobs out of containers that don't have provisioning delays I wonder if we're compacting all the requests into a much shorter period of time than in opendev17:23
fungioh, i hadn't made that connection, but yeah that would make sense17:29
clarkbiirc the flow is event comes in and we do a merge for each even to determine which jobs to run. Then when job is ready to run on an executor that executor does the merge again for each job. In opendev the randomness of node provisioning liekly acts as a good smoothing system for load on gerrit requests. But if your provisioning is near instant because it is effectively already17:31
clarkbprovisioned and you are just reserving a slice of it then you'll potentially generate a lot of merges in a much shorter period of time17:31
fungithe mm3/postorius docs are rather circular17:38
fungipostorius: "for config instructions see mm3 docs"17:39
fungimm3: "postorius is the management interface, see its docs for further info"17:39
fungihttps://docs.mailman3.org/projects/mailman/en/latest/src/mailman/docs/postorius.html17:40
clarkbThe mm3 docs have been painful to work with. Their rest docs don't really show you much about the requests and hide it all behind helper functions they've written17:43
clarkbWhich is fine to also document. But the API should be more explicit imo17:43
fungithe main thing i want to figure out is if there is any expectation that you can send control messages to the postorius address, or whether we need to just nullroute it17:48
clarkbBased on the exim4 configs in the mm3 docker repo I expect we can nullroute it otherwise they would've called the addr out there17:49
fungithat's my assumption too, just trying to find any indication that it could be a bad assumption17:49
fungilooking at acl_check_rcpt in our current exim4.conf, we automatically accept messages with a null host (comments say that's local injection), and also anything for postmaster at local domains, then apply sender verification to anything else17:55
fungii guess messages originating from within containers may be breaking that assumption and we might need to add some additional exceptions for loopback or whatever?17:57
clarkbis it looking at the tcp connection details or the sender in the headers?17:58
clarkbin this case it should be a local tcp connection17:58
clarkbthe error specifically says sender <postorius@lists.opendev.org> couldn't be verified which makes me think the issue is actually at a header level?17:59
clarkb(if it wasn't a local tcp connection then our iptables rule would've blocked it)_17:59
fungiwell, the address it's verifying is the sender specified in the smtp protocol header, yes18:02
fungithe exclusions before that verify rule could be based on tcp connection information though18:02
fungithe "require verify = sender" line in acl_check_rcpt:18:05
fungithe main exclusion before that rule seems to be "accept hosts = :" which the comments say is checking for local mail injection, but i guess that's from calling /usr/lib/sendmail or the like18:06
fungiotherwise the host would be 127.0.0.1 or ::118:06
fungior localhost or something like that18:07
clarkbthe comment says " Accept if the source is local SMTP (i.e. not over TCP/IP)."18:07
clarkbthat would be dropping files into the correct location? This is smtp though over localhost:2518:07
clarkboh and localdomains would be lists99.opendev.org not lists.opendev.org or lists.openstack.org etc18:08
clarkbWe could set exim_local_domains? But I'm not sure that is the right way to solve this. Might be better to make the address verifiabe, but I'm not sure what that requires18:09
fungiprobably adding postorius to /etc/aliases would suffice18:10
fungiwe could manually stuff it into /etc/aliases on the real lists.o.o if we wanted, assuming the test server doesn't consider it to be local18:12
clarkbis the test server querying exim on the prod lists.opendev.org server to verify that?18:12
fungiin theory, yes, unless it thinks lists.opendev.org is a local domain18:13
clarkbhrm we add lists.opendev.org to mm_domains which is added to exim_local_domains18:15
clarkboh but we only allow postmaster at the local domain18:16
clarkbso ya I think it would be sufficient to add it to /etc/aliases on the test server18:16
*** tosky_ is now known as tosky18:16
fungiyeah, domainlist localdomains in the exim4.conf has it18:17
fungier, local_domains18:17
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/851248/56/playbooks/zuul/files/host_vars/lists99.opendev.org.yaml line 42 or so we add an entry? I'm not sure what we should alias it to though18:18
fungiwe could alias it to one of the magic addresses like :fail:18:19
clarkbfungi: something like: '  postorius: :fail: Outgoing email only from this address' ?18:20
fungiit's worth a try. you could add it locally on the server and test18:20
clarkbok let me try that18:20
fungijust stick that line in /etc/aliases18:20
clarkbheh "a user is already registered with this address" Not a godo failure method if the server 500s but creates the records anyway. /me tries another address18:22
fungicould probably also delete the broken account18:22
clarkbsender verify fail for <postorius@lists.opendev.org>: outgoing email only from this address18:23
fungithough also, maybe password recovery process works with the account18:23
clarkbthat means /etc/alias did modify the behavior but :fail: isn't what we want18:23
fungiokay, so 1. we know it's checking the local delivery, and 2. we can't :fail: it18:23
fungi:blackhole: maybe?18:24
fungihttps://www.exim.org/exim-html-3.20/doc/html/spec_23.html#SEC63418:24
corvusclarkbtristanC jitter would be the main thing.  https://review.opendev.org/848516 could potentially help in some cases (less likely to help with periodic jobs, but otherwise very good for a 3pci system to cut down on its impact).18:24
clarkbfungi: that worked. It claims email was sent to me18:25
fungiinteresting!18:26
corvusblackhole is ~= delivering to /dev/null18:26
corvussuccessfully18:26
fungiyeah, mainly just confirming that's effective for bypassing sender verification18:26
clarkbcorvus: ya in this case mm3 uses postorius@listdomain to send email verification emails to people doing account signups18:26
clarkbcorvus: those emails were getting rejected on sender verification by exim. Will it blackhole the email even if it is the sender? or just if it is the recipient?18:27
corvusjust rcpt18:27
clarkbI didn't get the email but didn't expect to due to our iptables rules so hard to say why it wasn't delivered.18:27
clarkbAh ok so thsi is probably workable (for now anyway)18:27
clarkbas an alternative I could alias it to mailman which is a local user18:28
fungia cleaner option might be to configure exim to skip checking the postorius sender address and then add an explicit :fail: message for it for cases where users try to send something to or reply to that address18:28
fungirather than accepting messages for it and throwing them away18:28
fungibut this is sufficient for now18:28
clarkbI can add a TODO to the yaml config to improve the situation. Will push a patch up shortly18:28
clarkbin the exim mainlog I see it failing to connect to send that email indicating our iptabltes rules are working18:29
fungiperfect. broken as designed! ;)18:29
clarkband there are definitely no logs like that for nobody@openstack.org or test@example.com implying that creating lists and adding owners isn't generating emails to them18:30
clarkbBut I want to do mroe testing of that before we roll with it18:30
fungiyeah, for the test list creation we pass a flag to tell it not to send a notification18:31
fungior at least we did at one point18:31
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124818:32
clarkbfungi: thats a mm2 only thing as far as I can tell18:32
fungioh, got it18:32
clarkbthe mm3 rest api doesnt appear to have toggles for that sort of thing18:32
clarkbI'm going to update the hold to catch that latest update18:33
clarkbit might also be some sort of situation where if an account matching that email address exists and the email addr is verified then you'll get emailed18:34
clarkbBut when boot strapping lists like this with no accounts it won't send email to unverified addresses18:35
clarkbIf that is the case then the next thing we need to sort out is whether or not adding an account for that email address later will properly associate the ownership to the list18:35
clarkbI think we can test this by adding creating a user to match one of those email addresses, then using django admin interface to manually verify the meail then see if we are a list admin18:36
tristanCcorvus: clarkb: ianw: ok thank you, we'll update zuul and setup the jitter settings to reduce the number of connections opened by sf-project-io18:38
clarkbfungi: I've got to do lunch and drop the kids off after, but is there a way to read the unsent email off disk from exim? If so I should be able to follow the link that postorius sends out to verify my account that way without the email actually getting deliveryed18:55
corvusclarkb: /var/spool/exim18:57
clarkbcorvus: thanks!18:58
corvusthere's a pair of files for each msg in the queue (suffixed with -D and -H)18:58
fungiyep, that exactly18:58
corvusfilename is queue id which shows up in the logs18:58
fungialso the same queue id which shows up in `exim4 -bp` or `mailq` output18:59
fungiin a parallel directory you'll also find an ephemeral delivery log for that queue item19:00
clarkbgreat that should allow me to check account verification without worrying about spam filters and iptables.19:01
clarkbAnd then after that I think I'll try creating another list and make myself an owner and see if it tries to send email to me19:02
clarkband maybe also create another list for an uncreated user and see if it tries to send email to them19:02
fungiyeah, i agree, not sending to addresses which don't have an account may be a safety measure in order to avoid becoming as much of a potential spam source19:03
fungii think bulk account creation is part of the list subscriber migration19:03
fungiidea being people will get new mm3/postorius accounts the first time a list they're subbed to is imported, and if they want to use it they can do a password reset dance through the webui19:04
fungiwe would presumably do the same for list owners/moderators19:06
clarkbit definitely lets me add an owner email address when that email addr doesn't have an associated account yet19:06
clarkbthat is what the current change does, it sets owner to test@example.com19:06
fungihopefully once the account for that address is created the user will be able to manage the list19:06
clarkbyup exactly19:07
*** dviroel is now known as dviroel|afk19:07
fungiand unlike mm2, no more shared passwords for owner/mod access, it's just associated with your account (so you have a login to postorius which gets you access to manage all the lists you're associated with, as well as your subscription settings for any to which you're subscribed)19:09
clarkb158.69.70.114 is the new host fwiw19:14
opendevreviewDavlet Panech proposed openstack/project-config master: Add starlingx/jenkins-pipelines repo  https://review.opendev.org/c/openstack/project-config/+/85291919:47
clarkbI have successfully created an account on that server by grabbing the confirmation url out of /var/spool/exim4/input20:21
clarkbI just tested that a single account is valid across multiple vhosts. Though I had to log in again20:22
clarkb(the cookies are domain scoped I guess so that makes sense)20:23
fungiperfect. and yeah that's something i'd tested in the original poc as well but good to see the new orchestration and container bits don't change the behavior20:23
fungidoes creating a new list with your account as the owner generate a notification?20:25
clarkbI haven't managed that yet. Need to figure out how to do it from curl20:25
clarkbbut that is next20:25
fungithe cli tools don't work any longer?20:28
fungii thought i'd used those in an earlier 3.x20:29
clarkbI'm sure tehy do but I hate them20:33
clarkbmostly because they thin that documenting the rest api means "here's some python to run"20:33
clarkband they aren't even commands its, fire up and interpreter and import some stuff and call a function20:33
clarkbI find it extremely clunky and far prefer using something like curl for things like this20:34
clarkbI should be able to understand your api without firing up a python interpreter20:34
fungioh, i meant the actual executable scripts like `newlist`20:35
clarkbfungi: I didn't realize those existed all the docs examples have you import something.somethingelse.cli20:37
clarkbthen you call functions out of the cli that way20:37
fungiwell, docker-compose exec doesn't seem to be viable with these anyway20:37
clarkbit isn't interactively since that bug20:38
clarkbit did work last week but then runc broke us20:39
clarkbanyway list has been created and I've been added as an owenr20:39
fungioh, i can actually run some things, looks like20:39
fungino, nevermind, i faked myself out20:39
fungiOCI runtime exec failed: exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown20:39
clarkbIt did not send me an email20:39
clarkbyes thats the runc bug20:39
clarkbhttps://github.com/opencontainers/runc/issues/355120:40
fungiick20:40
clarkbanyway it works if you drp -t and just do random commands but that won't work with the mailman3 examples of running an interpreter interactively20:41
clarkbits fine curl works great :)20:41
clarkbbut ya no emails generated from that. In the web ui I see different options for the list I own now too so that bit seemed to work20:41
clarkbLooks like it doesn't auto sub you to a list when you are owner. But I think that is fine20:41
fungipretty sure mm2 never did either20:42
fungisubscribe the owner i mean20:42
clarkbnow the thing to do is for me to create a user for test@example.com and see if that new user is associated with ownership on the existing list20:42
clarkbheh ok that didn't work because test@example.com has no smtp service so exim completed that delivery properly without spooling things?20:45
clarkbI'll create a third list with a third email addr as owner and then sign up for that20:45
clarkbyup when I make a user and verify the email and login I'm an owner for the list that was precreated with my email as owner20:49
fungi$ host -t mx example.com20:49
fungiexample.com mail is handled by 0 .20:49
fungithat may result in some strange mailrouting20:49
clarkbfungi: exim just says "I'm done don't need to do anything"20:49
clarkbanyway I think this confirms a few important details of mailman3 behavior for us20:49
clarkbfirst is that creating a list and adding an owner to it (at least via the rest api) does not spam the owner. Second if we set the owner before an account exists for that email addr it will auto associate the ownership with that email once the account is created and email is verified20:50
clarkbthat means we should be able to populate the existing change with all of our lists and set the owners properly in testing now20:50
fungithinking about it a bit more, the notifications are probably no longer necessary anyway, because there's no precreated admin password to send them20:51
fungiso this is just fine and dandy20:51
clarkbfungi: any objections to me updating the change with all of our current lists and setting the owner to the actual owners given ^?20:53
fungino objection on my part20:53
fungiespecially since outbound smtp will be blocked from the test node initially anyway, just in case20:53
clarkb++20:54
clarkbfungi: in mm2 there are mailman@domain lists. Any idea if we still need those?20:59
clarkbI'm not sure I ever really understood the functionality of those lists21:00
opendevreviewJeremy Stanley proposed openstack/project-config master: Add #openstack-latinamerica to accessbot  https://review.opendev.org/c/openstack/project-config/+/85292221:06
opendevreviewJeremy Stanley proposed opendev/system-config master: Add IRC logging for #openstack-latinamerica  https://review.opendev.org/c/opendev/system-config/+/85292321:06
fungiclarkb: those lists are used primarily for the monthly password reminders we disable, but also for things like owner e-mail i think? anyway pretty sure mm3 does not need them21:08
clarkbcool I'll go ahead and remove them21:11
fungialso when we're getting close to migrating any list domains, we should take the opportunity to check whether there are any more we can/should retire in order to avoid migrating more than necessary21:13
clarkbI was wondering about that recently. What is our plan for archives of old lists? DO we need to end up creating them just to give them somewhere to live?21:15
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124821:17
funginope, we can just copy the old pipermail archive tree straight over. for both retired and active lists. that keeps the old urls for the archives working21:17
clarkbI'm going to swap out the holds for ^21:17
fungifor active lists we'd also import the archives, but keep the old pipermail copies served as well since the url patterns differ and people have linked to them all over21:18
clarkbbut that adds all the lists to the testing node, removes the password attribute for lists and renames admin to owner to align with the api better21:18
clarkbhttp://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fac/851248/58/check/system-config-run-lists3/fac28b8/bridge.openstack.org/screenshots/mm3-openstack-main.png that looks good. I checked the exim spool and logs to be extra sure no emails were attempted and it looks good21:48
fungiawesome!21:49
clarkbI'm going to take a break, but when I get back I'm going to put mailman down for now. The next thing is going to be fussing with users and file perms and all that to see if I can make xapian happy in a reasonable way21:49
clarkband that I expect to be quite consuming21:49
fungihaving dealt with xapian in a desktop setting years hence, i don't believe it will ever truly be happy21:50
clarkbheh. In this case it is beacuse the mailman containers run as uid 100 but I've set things up to have uid something elseo n the host side. I don't really want to use 100 on the host side because thats _apt or some such21:51
clarkbI think what we need to do is find a way to change the uid on the container side (bind mounting in /etc/passwd or doing our own images that update the upstream ones)21:51
clarkbnone of the options here are particularly good. I'm a bit suprised that the upstream images use such a low uid considering it is almost always going to collide with something on the host system21:52
clarkbanyway 23.253.108.60 is the most recently held node if you want to look at it21:52
fungimaybe that's a kubernetes assumption showing through21:52
clarkbthey use docker-compose actually21:52
fungioh, interesting21:52
fungiyeah, very odd choice21:52
clarkbfungi: re 23.253.108.60 one of the other things to test on the list is list behavior. Can we send email to subscribers, what about dmarc, what about private lists and so on21:53
clarkbnot sure if you wanted to poke at that but the node is there and ready for it if you have time21:53
fungiyep, i should be able to give that stuff a shot and make some notes/observations, though probably tomorrow morning at this point21:58
clarkbin theory we should be able to patch in a lot of those settings if we want to make them consistent. I think mailman also has templates and styles (not sure how they differ) that we might be able to setup then apply to lists as necessary22:01
clarkbconsidering everything is already automated we probably don't need to rely on those features much and can directly configure what we want22:02
clarkbheh even the register has picked up on the ssh sha1 problems https://www.theregister.com/2022/08/11/red_hat_ssh/22:03
*** rlandy is now known as rlandy|bbl22:12
opendevreviewMerged openstack/project-config master: Add starlingx/jenkins-pipelines repo  https://review.opendev.org/c/openstack/project-config/+/85291922:21
clarkbhttps://github.com/maxking/docker-mailman/blob/main/web/docker-entrypoint.sh#L148-L150 adds more mystery to the user problems. I would've expected that to address it22:23
clarkbI think maybe they removed the bit that allowed uid and gid to be configurable based on that comment. But the chown should've made things work for xapian22:24
clarkbI wonder if we can just get away with precreating the dir that xapian wants so that privileged chown will apply to it. I suspect the problem is that xapian is trying to create the dir after it has dropped privs22:26
opendevreviewMerged openstack/project-config master: Add #openstack-latinamerica to accessbot  https://review.opendev.org/c/openstack/project-config/+/85292222:33
fungioh, maybe23:28
opendevreviewIan Wienand proposed openstack/project-config master: linter: update ansible-lint; add auto-download of roles  https://review.opendev.org/c/openstack/project-config/+/85127823:29
ianwgosh xapian is something i haven't heard in a long time23:36
clarkbianw: it is the recommended hyperkitty indexer23:38
clarkbthe default is something else but apparently the default will change in the next release or something and be xapian23:38
ianwlast time i used it was with a fairly popular moinmoin wiki for Itanium linux development with Gelato@UNSW23:39
ianwlinux and UNSW still exist, none of the other bits do :)23:39
fungii suppose it's no worse than mediawiki's search plugin using logstash23:39
fungier, elasticsearch i mean23:40
fungii bet xapian is still using an open source license at least ;)23:40
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124823:41
ianw"Change 852922 in project openstack/project-config does not share a change queue with 852923 in project opendev/system-config"23:41
ianwon https://review.opendev.org/c/opendev/system-config/+/852923 ... that's a new one?23:42
clarkbfungi: ^ I put a hold on that latest patchset but didn't delete the older one as I'm nto sure I won't break things. I think if this newer one deploys you can use it to debug email stuff just fine. Otherwise fallback to the old one and I'll clean up once we know how things look23:42
clarkbianw: the reporting is new but the error is not.23:42
clarkbianw: it just needs a recheck or reapproval once the depend on has merged23:43
ianwclarkb: are you ok with https://review.opendev.org/c/opendev/system-config/+/852793 for now?23:43
clarkbianw: ya I tried to make it clear my -1 was more about getting tripleo to think about their needs instead of blindly choosing something. But if the old upstream mirror is up to date switching to it now should be fine23:43
clarkbianw: I was worried that if I +2'd it would've gotten lost23:43
clarkbI also responded to the openstack-discuss thread on that pointing out that we do make changes to our mirror backend stuff and don't really consider it to be a public interface23:44
clarkbthings like converting pypi from bandersnatch to caching proxy and removing all source packages23:44
ianwyeah, i agree we want to do a bit of research before just switching blindly23:44
ianwin this case i think we have; i've tried to reach out at least23:45
fungiianw: as for the verified -2, it's a side effect of starting to report the approval of a dependent in a non-shared queue instead of completely ignoring the approval event, with unanticipated fallout for the openstack tenant's "clean check" rule23:50
opendevreviewMerged openstack/project-config master: linter: update ansible-lint; add auto-download of roles  https://review.opendev.org/c/openstack/project-config/+/85127823:56
opendevreviewMerged opendev/system-config master: system-config-run-borg-backup: rename hosts to distro  https://review.opendev.org/c/opendev/system-config/+/85268523:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!