diablo_rojo | Hello... ummm it seems there is no ptgbot in the #openinfra-events channel? Would someone be able to restart it? | 00:02 |
---|---|---|
fungi | i just spotted the same. looking into it now | 00:03 |
diablo_rojo | lol yeah... | 00:03 |
diablo_rojo | Whoops | 00:03 |
fungi | the last thing it logged to its debug log was on 2022-05-25 | 00:04 |
diablo_rojo | Oh okay so before the Summit even. | 00:04 |
diablo_rojo | Heh | 00:04 |
fungi | though the process is running and says it was started 2022-05-13 | 00:04 |
diablo_rojo | Whoops | 00:04 |
fungi | it probably lost contact with the irc server it was connected to and never realized it's been listening on a dead socket since then | 00:05 |
fungi | i'll restart the container | 00:05 |
diablo_rojo | Thank you fungi ! | 00:05 |
diablo_rojo | much appreciated | 00:05 |
fungi | #status log Restarted the ptgbot container on eavesdrop01 since it seems to have fallen off the IRC network on 2022-05-25 and never realized it needed to reconnect | 00:06 |
opendevstatus | fungi: finished logging | 00:06 |
fungi | the containers equivalent of "have you tried turning it off and on again" | 00:07 |
opendevreview | Merged openstack/project-config master: Match the ansible-lint <6.5 pin from zuul-jobs https://review.opendev.org/c/openstack/project-config/+/855098 | 00:24 |
diablo_rojo | I maybe now have killed the ptgbot site? :D | 00:34 |
fungi | i can check the apache logs | 00:37 |
fungi | Connection refused: AH00957: HTTP: attempt to connect to 127.0.0.1:8000 (localhost) failed | 00:38 |
diablo_rojo | In my defense, it wasnt faulty json this time lol | 00:38 |
diablo_rojo | At least syntactically | 00:38 |
fungi | 2022-08-30 00:31:22,254 ERROR ptgbot.bot: Bot airbag activated: Unusually large message: diablo_rojo: Error loading DB: [Errno Expecting property name enclosed in double quotes] | 00:41 |
fungi | whatever it was crashed the process that listens on 8000/tcp, i think | 00:42 |
fungi | seems like maybe the db needs to be wiped? | 00:43 |
fungi | i guess it managed to partially write something while crashing? | 00:43 |
fungi | i can try restarting it and see if it's really the db contents causing the problem | 00:44 |
fungi | the ptgbot-web process started up okay after downing and upping the container again | 00:46 |
diablo_rojo | I cleared the db | 00:47 |
diablo_rojo | Gonna do some local testing before I go loading the bot again. I thought it was a simple json tweak, but I don't want to keep needing you to restart. | 00:48 |
diablo_rojo | Thank you for all your help already fungi ! | 00:48 |
fungi | you bet. i'll be around for a while longer if you need to test some more | 00:49 |
*** rlandy|bbl is now known as rlandy | 00:54 | |
*** rlandy is now known as rlandy|out | 01:09 | |
*** dasm is now known as dasm|off | 02:00 | |
*** ysandeep|out is now known as ysandeep | 04:52 | |
*** pojadhav|out is now known as pojadhav|ruck | 04:59 | |
opendevreview | Ke Niu proposed opendev/system-config master: remove unicode prefix from code https://review.opendev.org/c/opendev/system-config/+/854487 | 05:38 |
*** ysandeep is now known as ysandeep|afk | 05:41 | |
*** ysandeep|afk is now known as ysandeep | 06:02 | |
opendevreview | Rafal Lewandowski proposed openstack/diskimage-builder master: added elrepo element https://review.opendev.org/c/openstack/diskimage-builder/+/853817 | 07:06 |
*** ysandeep is now known as ysandeep|afk | 07:36 | |
*** jpena|off is now known as jpena | 07:36 | |
*** elodilles_pto is now known as elodilles | 08:06 | |
*** ysandeep|afk is now known as ysandeep | 09:12 | |
opendevreview | Rafal Lewandowski proposed openstack/diskimage-builder master: removed -l from shebang https://review.opendev.org/c/openstack/diskimage-builder/+/855154 | 09:28 |
*** soniya29 is now known as soniya29|afk | 10:26 | |
*** rlandy|out is now known as rlandy | 10:38 | |
*** ysandeep is now known as ysandeep|break | 11:27 | |
*** dviroel|out is now known as dviroel | 11:30 | |
*** soniya29|afk is now known as soniya29 | 11:31 | |
*** ysandeep|break is now known as ysandeep | 12:20 | |
*** dasm|off is now known as dasm | 13:46 | |
*** ysandeep is now known as ysandeep|dinner | 14:39 | |
*** dviroel is now known as dviroel|mtg | 14:43 | |
*** ysandeep|dinner is now known as ysandeep | 15:00 | |
*** dviroel|mtg is now known as dviroel | 15:18 | |
*** artom__ is now known as artom | 15:34 | |
*** dviroel is now known as dviroel|lunch | 15:40 | |
fungi | seen as a topic branch in the docker-jitsi-meet repo, there was apparently recent work to add support for etherpad's ep_whiteboard plugin. that might be worth looking into | 15:48 |
clarkb | fungi: related, is there a change to update the meetpad config for the audio stuff yet? | 15:49 |
clarkb | does it make sense to start with just setting the one setting and then reconciling the config deltas afterwards? | 15:49 |
fungi | i was restarting my work on that, which is what caused me to notice the wbo stuff | 15:50 |
fungi | and yeah, the challenge is that settings have moved from one place to another, and at the same time some things we were setting change to become default | 15:51 |
fungi | so i'm probably going to try a clean import of the configs first | 15:51 |
fungi | and then identify what bits we still may want to override | 15:51 |
clarkb | fungi: on the mm3 update a good canary I've found is that the hyperkitting listing renders properly: https://4b7ee2cce31df7bf2b0a-c162fa8e75cb459a7d10e69223bc94c7.ssl.cf1.rackcdn.com/851248/64/check/system-config-run-lists3/915494e/bridge.openstack.org/screenshots/mm3-opendev-archives.png I think those updates lgtm based on that having lists listed | 15:52 |
clarkb | after my current meeting I'm going to hunt down some food, but then I'll look into comparing migrated lists against new list configs to see how they compare and if we should be setting any additional config items | 15:53 |
*** ysandeep is now known as ysandeep|out | 15:56 | |
fungi | in theory all the relevant options were mapped over, and at least the incidents list didn't end up with a public archive, which is good | 15:56 |
clarkb | ya I mean for the new lists | 15:56 |
fungi | oh, right-o | 15:57 |
clarkb | using migrated lists as a sanity check against our defaults for new lists | 15:57 |
fungi | yes, agreed | 15:57 |
fungi | it should be possible to dump the configs and do a sxs comparison | 15:57 |
fungi | the held node has some of both (i only migrated three lists) | 15:58 |
fungi | i can also try importing some others if there are specific kinds we're interested in, but the three i did were our basic three archetypes (announcements, discussion, private) | 15:59 |
*** NeilHanlon_ is now known as NeilHanlon | 16:03 | |
clarkb | yup I was going to compare service-discuss and openstack-discuss to start. And ya the rest api will dump a json represetnation of the config which we can diff | 16:04 |
fungi | another option might be to hold a second node and dump the config from the pre-import service-discuss for comparison to post-import | 16:16 |
fungi | or just redo the migration on a new held node and compare before/after configs | 16:17 |
clarkb | I don't think that is necessary since all of them are created the same way with ansible and the rest api | 16:32 |
clarkb | so they've all got the same config when created new | 16:33 |
*** dviroel|lunch is now known as dviroel | 16:33 | |
*** jpena is now known as jpena|off | 16:42 | |
clarkb | fungi: on the imported list these are the settings that differ from the default created list: convert_html_to_plaintext, filter_extensions, process_bounces, pass_types | 16:43 |
clarkb | fungi: on the held node in my homedir you can diff service-discuss2.json and openstack-discuss2.json | 16:43 |
clarkb | https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/handlers/docs/filtering.html#conversion-to-plain-text for the first one | 16:46 |
clarkb | maybe we double check lynx is installed in the docker images and then default to that? | 16:46 |
clarkb | I don't think lynx is installed in the mailman-core image so taht may be an image bug | 16:47 |
clarkb | https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/handlers/docs/filtering.html#passing-and-filtering-extensions is the next thing. Filtering that list of file extensions by default seems reasonable | 16:48 |
clarkb | I'm not sure about process_bounces. Haven't found a good doc for it. I think that is having mailman process the bounces for sent email (and possibly unsubbing people?) | 16:50 |
clarkb | https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/handlers/docs/filtering.html#passing-mime-types for the last thing. That also seems reasonable | 16:51 |
clarkb | Let me know if you think we should be setting those to match the imported list when we create new lists and I can work on updating our ansible for that | 16:52 |
fungi | clarkb: yes, turning off process_bounces was a workaround for dmarc-enforcing recipients getting unsubscribed when their mtas rejected messages with broken dkim sigs | 16:54 |
fungi | well, not really unsubscribed, but by default their subscriptions get set to non-delivering | 16:54 |
clarkb | gotcha. Seems like we should turn that off by default then. At least to start that seems like a good safety toggle | 16:54 |
clarkb | I'll work on a patch | 16:55 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 17:01 |
fungi | right, if we switch to using some of the active dmarc mitigation mechanisms, we can probably go back to processing bounce messages again | 17:04 |
fungi | though it not only helps avoid mass-unsub when people send dkim-signed messages to a list, it also protects against mass-unsub events when somebody decides to put the server on a spam blocklist | 17:05 |
fungi | but on the flip side of that, it means we'll continue sending to more and more invalid addresses over time as people change jobs or delete accounts without fixing their subscriptions | 17:05 |
fungi | so some mail systems may decide the listserv is spammer-controlled because it's constantly sending to lots of dead addresses | 17:06 |
*** efoley_ is now known as efoley | 17:24 | |
*** pojadhav|ruck is now known as pojadhav|out | 17:53 | |
clarkb | re lynx that might be another vote towards modifying the upstream images since we can layer that in pretty easily | 18:40 |
fungi | good point | 18:52 |
*** dasm is now known as dasm|off | 19:21 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Install refstack with openstack constraints https://review.opendev.org/c/opendev/system-config/+/855279 | 19:51 |
clarkb | infra-root ^ refstack is erroring because cryptography 37 removed a method for rsa key verification. I expect that to fix things by pinning cryptography to an older version, but want to make sure that builds properly and check installed versions first | 19:56 |
clarkb | And now lunch, then I'll look at cleaning up that mm3 change a bit | 20:00 |
diablo_rojo | Thank you clarkb ! | 20:01 |
fungi | i'll start the openstack-discuss import test by analyzing the amount of data i need to rsync over from prod | 20:01 |
fungi | df is taking... a while | 20:15 |
fungi | er, du i mean | 20:16 |
fungi | 22G /srv/mailman/openstack | 20:33 |
fungi | yeeowch | 20:34 |
fungi | but the held node has plenty of room to spare, so not too concerned. the rsync will just take a while | 20:34 |
fungi | because of the container uid/gid issue, the easiest solution is to rsync over to ~ as mailman and then mv the directory into ~mailman/import and chown it to the container-friendly owner | 20:45 |
fungi | but at least that way if we mount a volume at ~mailman for all the containers, it will be a quick atomic move | 20:45 |
clarkb | bah no wget and no curl on the python-builder image | 21:17 |
clarkb | tahts ok I can install it. but I should've checked that first | 21:18 |
clarkb | oh we already install curl. /me changes to that one | 21:18 |
opendevreview | Clark Boylan proposed opendev/system-config master: Install refstack with openstack constraints https://review.opendev.org/c/opendev/system-config/+/855279 | 21:20 |
ianw | weird that refstack depends on a rsa key thing from cryptography...? | 21:22 |
clarkb | ianw: it uses pubkey auth for some reason | 21:29 |
clarkb | I think because the idea was you'd run it from a script in a system updating your results rather tahn interactively via a web browser | 21:29 |
clarkb | but yes | 21:29 |
fungi | rsync is roughly half done | 21:34 |
*** lbragstad1 is now known as lbragstad | 21:35 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 21:47 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM force mm3 failure to hold the node https://review.opendev.org/c/opendev/system-config/+/855292 | 21:47 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 21:54 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM force mm3 failure to hold the node https://review.opendev.org/c/opendev/system-config/+/855292 | 21:54 |
clarkb | ok that restores all the regular CI jobs and adds an infra prod job | 21:55 |
clarkb | and cleans up a number of todos and so on. I think we can call this ready for review now. I'm going to put a hold on 855292 so that we can double check things like the pipermail redirect work as expected | 21:55 |
clarkb | I'm keeping the old hold in place because fungi is still using it to test the migration of service-discuss | 21:55 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/855279 appears to have installed cryptography 36.0.2 as expected | 22:00 |
fungi | clarkb: thanks, yep i'm shoving more data to it now in fact | 22:04 |
fungi | nearly done copying | 22:04 |
clarkb | fungi: we should probably pay attention to timing data for the openstack-discuss migration too just to get an idea of how long their downtime will last | 22:05 |
clarkb | also I think I might have fixed the auto filed warnings. | 22:05 |
clarkb | You're supposed to set the path to the class as a string and not the actual class object | 22:05 |
clarkb | wow we run a lot of jobs against 851248 when I put them back in. I wonder what I edited to trigger that | 22:16 |
clarkb | bah something broke. Maybe the auto field thing? we set no log though :/ | 22:21 |
clarkb | I'll be able to check it on the held node soon enough | 22:22 |
fungi | rsync of the openstack site took approximately two hours to complete. for the real migration, i expect we'll seed the copy while everything is running, then do one final rsync once services are stopped in order to minimize downtime from that | 22:25 |
clarkb | ++ | 22:27 |
clarkb | ok my held node change didn't fail | 22:27 |
clarkb | well it will fail through the forced test failure but it looks like it passed otherwise? | 22:27 |
clarkb | its failing on the django admin user creation step which was always flaky before because you ahve to delay for enough of the db to be ready. I thought I had done that but apparently not | 22:30 |
clarkb | er I thought I had added sufficient checks for the db to be ready | 22:30 |
clarkb | I think the auto filed thing is working comparing https://zuul.opendev.org/t/openstack/build/915494ed931a4fdeb3794bf82fe6bae8/log/job-output.txt#20442 to https://zuul.opendev.org/t/openstack/build/ba392150076f475d9eae0a90f7efcdca/log/job-output.txt#20715 | 22:33 |
clarkb | fungi: 104.130.4.104 is the newly held node. That one can be used to check the pipermail stuff when you are done with openstack-discuss. But no rush | 22:34 |
*** dviroel is now known as dviroel|out | 22:37 | |
fungi | i took a sushi break, but am going to start testing the import now (and timing each step) | 22:40 |
fungi | okay, got everything moved to the right place and ownership applied to the files, import is proceeding | 22:51 |
fungi | i expect this to take a while | 22:51 |
fungi | actually, the first stage is already completed: real 1m54.675s | 22:53 |
fungi | hopefully the other steps go as quickly | 22:54 |
fungi | the archive import will probably be the slowest step | 22:54 |
fungi | it's to the point of displaying the counter | 22:56 |
fungi | it reported skipping a message | 22:56 |
clarkb | did it indicate why? | 22:57 |
fungi | MySQLdb._exceptions.OperationalError: (1153, "Got a packet bigger than 'max_allowed_packet' bytes") | 22:57 |
fungi | maybe we need to tune the db? | 22:57 |
fungi | it's up to 10% imported now and so far just the one skipped message, at least | 22:58 |
clarkb | interesting. I wonder if that is a mismatch between client lib expectations and mariadb | 22:58 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 22:59 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM force mm3 failure to hold the node https://review.opendev.org/c/opendev/system-config/+/855292 | 22:59 |
clarkb | That adds another db is steady state check attempt | 22:59 |
clarkb | I'm going to rotate out the newer node that I held | 22:59 |
fungi | archive import is up to 1/3 complete now | 23:01 |
clarkb | wow much quicker than I anticipated | 23:02 |
clarkb | fungi: don't forget to check the size of the xapian index dirs after wards | 23:02 |
fungi | yep | 23:03 |
fungi | they were pretty small after the opendev list imports, at least | 23:03 |
clarkb | and then I guess we need to look into that skipped message | 23:05 |
fungi | i can almost guarantee it's because of some massive attachment | 23:05 |
clarkb | https://dba.stackexchange.com/questions/886/changed-max-allowed-packet-and-still-receiving-packet-too-large-error | 23:06 |
fungi | unfortunately, all it gave me to go on is an opaque gmail message-id | 23:06 |
fungi | clarkb: yeah, i looked through a few similar posts. basically sounds like we could increase it in the container's my.cnf | 23:06 |
clarkb | yup | 23:06 |
fungi | if we want to preserve this one message | 23:07 |
fungi | (i mean, there might be more, but this is almost 3/4 done already and just the one skip so far) | 23:07 |
clarkb | I guess that means that the actual mail content is in mysql? | 23:07 |
clarkb | no more mbox or whatever typical email storage? | 23:07 |
fungi | seems that way | 23:07 |
clarkb | and then xapian indexes what is in mysql | 23:08 |
fungi | it might also keep an mbox copy too, i don't know | 23:08 |
clarkb | ya I'm mostly trying to figure out how we estimate storage space needs | 23:08 |
clarkb | but I suppose if we deplioy with /var/lib/mailman on lvm backed with volumes we can roughly ballpark then adjust later if neccessary | 23:09 |
fungi | grep if the mbox found me the message: https://lists.openstack.org/pipermail/openstack-discuss/2018-November/000114.html | 23:12 |
fungi | 17MiB of attachments, looks like | 23:15 |
fungi | though the last one is the vast majority of that | 23:16 |
fungi | real 18m42.199s | 23:16 |
fungi | not bad for the import | 23:16 |
fungi | now for the reindex | 23:16 |
fungi | underway | 23:17 |
clarkb | I think I'm ok losing that message. However, maybe this indicates new messages with large attachments will also fail. Not sure if that is desireable | 23:17 |
fungi | Indexing 30183 emails | 23:20 |
fungi | and yeah, i feel like we should probably adjust the db config and do more test imports | 23:21 |
fungi | we should probably perform test imports of all our existing lists on a held node in order to shake out potential gotchas before we migrate them for real | 23:21 |
fungi | just to reduce the chances of aborting a maintenance window and possibly leaving ourselves with a mess to clean up | 23:22 |
fungi | also good for getting more accurate timing data for the actual migrations | 23:22 |
fungi | real 6m41.204s | 23:30 |
fungi | so wall clock time for all three steps 1m54.675s + 18m42.199s + 6m41.204s = 27m18.078s | 23:34 |
fungi | 1.2G /var/lib/mailman/web-data/fulltext_index | 23:39 |
fungi | 687M /var/lib/mailman/database | 23:39 |
fungi | 1.4G /var/lib/mailman/import/lists.openstack.org/archives/private/openstack-discuss | 23:40 |
fungi | for comparison | 23:40 |
fungi | so between the database and the search index, the on-disk size is around a third larger than the original | 23:42 |
fungi | nothing worrisome, i don't think | 23:42 |
fungi | we only have 46G used on / in prod, much of which is the operating system, so a 100gb volume would be plenty for years to come, but we could do 250gb just to be safe | 23:44 |
clarkb | Not too bad | 23:57 |
clarkb | my new check for django migrations isn't working and it isn't clear to me why | 23:57 |
*** rlandy is now known as rlandy|out | 23:57 | |
clarkb | bah because command doesn't do | | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!