| *** dmellado2 is now known as dmellad | 07:14 | |
| *** dmellad is now known as dmellado | 07:14 | |
| *** janders1 is now known as janders | 13:12 | |
| *** diablo_rojo_phone is now known as Guest26643 | 13:12 | |
| opendevreview | sean mooney proposed zuul/zuul-jobs master: make disto and pypi mirror configuration condtional https://review.opendev.org/c/zuul/zuul-jobs/+/961369 | 13:28 |
|---|---|---|
| *** janders0 is now known as janders | 13:49 | |
| *** open10k8s_ is now known as open10k8s | 13:49 | |
| *** clarkb is now known as Guest26673 | 13:57 | |
| *** acoles_ is now known as acoles | 14:00 | |
| fungi | the mirror.ubuntu volume move has been going for 26 hours at this point | 14:41 |
| fungi | if it only takes as long as mirror.ubuntu-ports then it should finish by the time for our meeting, though i have a feeling it will take longer since it still has bionic packages | 14:42 |
| *** Guest26673 is now known as clarkb | 14:46 | |
| mnasiadka | Ah, that's why some Kolla Ubuntu builds are failing ;-) | 15:13 |
| fungi | mnasiadka: which ones? arm64 on bionic? | 15:14 |
| mnasiadka | nope, some stale mirror it seems for Noble | 15:15 |
| mnasiadka | https://zuul.opendev.org/t/openstack/build/30691c49d8cb485093a2958a59ea2b25/log/kolla/build/000_FAILED_manila-share.log | 15:15 |
| fungi | the ubuntu mirror should only be at most 28 hours stale right now | 15:16 |
| mnasiadka | I think the discrepancy is Kolla using ubuntu cloud archive upstream and the standard Noble repos from mirror | 15:18 |
| mnasiadka | Is there a UCA mirror in OpenDev - or should we think of having one after all these moves? | 15:18 |
| fungi | aha, yeah uca may have newer dependencies than are in our mirror if they updated libsqlite3-0 in the past day | 15:19 |
| fungi | packages.ubuntu.com is being really obstinate in returning content for the updated package | 15:22 |
| fungi | looks like it may even be disjoint on their own mirror network | 15:22 |
| clarkb | mnasiadka: we have a uca mirror but it is updated separately of the main repos | 15:22 |
| clarkb | so the same disconnect can occur I think | 15:23 |
| mnasiadka | oh well | 15:23 |
| mnasiadka | even if these would run in the same cron script - I assume disconnects can occur | 15:24 |
| fungi | well, hopefully the mirror will be up to date by tomorrow | 15:24 |
| fungi | https://changelogs.ubuntu.com/changelogs/pool/main/s/sqlite3/sqlite3_3.45.1-1ubuntu2.5/changelog is persistently returning a 404 response | 15:27 |
| fungi | seems like this may be an update in progress | 15:27 |
| clarkb | https://opendev.org/openstack/kolla-ansible/commit/3be5a3852def1b4580c439809067bda7a2c49730 exists now so something must've triggered broader replication for kolla-ansible | 15:50 |
| fungi | eventually consistent replicas | 15:56 |
| clarkb | fungi: I rechecked https://review.opendev.org/c/opendev/system-config/+/958666 so that we can get that ready for merging when you're ready on the afs side of things | 15:59 |
| fungi | thanks! | 15:59 |
| opendevreview | Merged opendev/system-config master: Expand old chrome UA rules https://review.opendev.org/c/opendev/system-config/+/960399 | 16:04 |
| clarkb | I'm starting to look very briefly at summit scheduling and I think Friday night might be the only night I personally have free to try and do a get together | 16:08 |
| clarkb | so if there is interest in having an opendev (and probably zuul) dinner type thing try to keep friday night free! | 16:08 |
| clarkb | fungi and corvus are going to be there not sure who else is ^ but pass that along I guess. Note I don't think I'll be organizing anything super formal more of just a lets aim for that day and then find something that works | 16:09 |
| *** dan_with__ is now known as dan_with | 16:10 | |
| corvus | "try to coalesce into a social blob and imbibe sustenance" sounds like a plan! :) | 16:12 |
| fungi | infra-root: 960399 just finished deploying (to gitea, mailman, static, and zuul), so keep an eye out for any reports of rejected web requests on those sites/services | 16:21 |
| fungi | i guess we don't update gerrit with those | 16:21 |
| clarkb | we've been selective in deploying that to things that have had struggles with the crawlers I think | 16:22 |
| fungi | i couldn't remember which services used it and whether we remembered to trigger deployment jobs for all of them when the file changes | 16:26 |
| fungi | interesting... there's a lists01.openstack.org/main01 cinder volume in rax-dfw which isn't in use, created_at is the same day as the lists01.opendev.org server instance | 17:49 |
| fungi | aside from having the wrong name, it's also sata not ssd | 17:50 |
| clarkb | its possible the sata volumes are also speedy? we can probably profiel that somewhere. Or just use ssd because we know we're sensitive to iowait and want to avoid the problem as much as possible | 17:50 |
| fungi | i'm checking the current volume of data in /var/lib/mailman to confirm whether the minimum 100gb volume size will still be sufficient or if we need something larger | 17:51 |
| clarkb | don't forget to check the database size too | 17:51 |
| fungi | it's also in there | 17:51 |
| clarkb | ah ok | 17:52 |
| *** Guest26643 is now known as diablo_rojo_phone | 17:52 | |
| fungi | /var/lib/mailman/database on the host is mounted as /var/lib/mysql in the mariadb container | 17:52 |
| fungi | 41G /var/lib/mailman | 18:13 |
| fungi | so 100gb should be plenty | 18:13 |
| clarkb | might want to give some headroom? | 18:15 |
| clarkb | I think that the indexes in particular maybe a concern if xapian is anything like lucene | 18:15 |
| clarkb | I guess with lvm we can add a second volume and expand the fs without too much fuss if that becomes necessary | 18:16 |
| fungi | that seems like ample headroom anyway, but yes we can always use pvmove onto a larger volume if we want with no downtime | 18:16 |
| fungi | i've created a 100gb ssd volume named lists01.opendev.org/main01 and attached it to the server as /dev/xvdb, put a logical volume on it using all available extents and formatted it ext4 | 18:18 |
| fungi | i've temporaily mounted /dev/main/mailman at /mnt and will get an initial rsync going to it from /var/lib/mailman | 18:19 |
| fungi | `rsync -Sax --delete --info=progress2 /var/lib/mailman/ /mnt/` is running in a root screen session on lists01 | 18:22 |
| fungi | in other news, the mirror.ubuntu volume move is approaching the 30 hour mark, still going from what i can tell | 18:26 |
| frickler | looks like the mirror updates are still running, just the vos release is failing for them? might be better to stop those until the move is finished? | 18:41 |
| fungi | we could temporarily comment out the ubuntu reprepro cronjob and put the mirror-update server in the emergency disable list, though it's probably not got much longer to go at this point | 18:52 |
| fungi | omw to an early dinner, bbl | 19:45 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Switch generic container role image builds back to docker https://review.opendev.org/c/opendev/system-config/+/961410 | 19:47 |
| clarkb | this is the chagne to flip our default image builder back to docker (everything should already be using docker due to explicit overrides, this is just ensuring our default matches our intention going forward) | 19:47 |
| clarkb | fungi: I left some notes on your etherpad. Overall looks good just some minor things to consider | 19:51 |
| fungi | i suppose i could rename to something other than /var/lib/mailman.old that wouldn't risk getting backed up in the few minutes between those steps | 21:25 |
| fungi | it's on the rootfs so a rename to basically anywhere would still be atomic | 21:26 |
| clarkb | ya that would also work | 21:26 |
| clarkb | I just don't want us to accidentally make backups carry data we don't want | 21:26 |
| fungi | i'm not super familiar with what we include/exclude. what path would you recommend? | 21:28 |
| clarkb | fungi: https://opendev.org/opendev/system-config/src/commit/03ba936444f4b2e42b08981b549e53b90b267814/playbooks/roles/borg-backup/defaults/main.yaml this is the deafult set of rules | 21:29 |
| clarkb | /var/cache might be a good choice | 21:29 |
| clarkb | I think that isn't mounted as tmpfs or anything | 21:29 |
| clarkb | and it is in the exclude list | 21:30 |
| fungi | running fio on /mnt now following the same options you last used on the rootfs for comparison | 21:32 |
| clarkb | note it created the four fungi-test.* files. Dont' delete them until you're done as it will reuse them if you run multiple passes | 21:33 |
| fungi | READ: bw=111MiB/s (116MB/s), 111MiB/s-111MiB/s (116MB/s-116MB/s), io=3337MiB (3499MB), run=30036-30036msec | 21:33 |
| clarkb | fungi: if you look near the top of the output it gies what I think is a better summary as it includes iops | 21:33 |
| fungi | bw ( KiB/s): min=109654, max=115386, per=100.00%, avg=113857.77, stdev=330.6 | 21:34 |
| fungi | 2, samples=239 | 21:34 |
| fungi | iops : min=27412, max=28846, avg=28464.11, stdev=82.68, samples=239 | 21:34 |
| fungi | that? | 21:34 |
| clarkb | no it looks like read: IOPS=814, BW=3260KiB/s (3338kB/s)(95.8MiB/30085msec) | 21:35 |
| clarkb | so it includes uops and bw on the same line which I feel is a better summary | 21:35 |
| fungi | read: IOPS=28.4k, BW=111MiB/s (116MB/s)(3337MiB/30036msec) | 21:35 |
| clarkb | fungi: and was that for read or randread? | 21:36 |
| clarkb | (I think they both say read: in the output unfortunately) | 21:36 |
| fungi | i don't see a randread in the output | 21:37 |
| clarkb | fungi: its in the command you run --rw=read or --rw=randread iirc | 21:37 |
| clarkb | it selects which type of test to perform. | 21:38 |
| fungi | oh, i ran it with --rw=read | 21:38 |
| fungi | since that was the last way you ran it | 21:38 |
| clarkb | ya I was collecting both sets of data | 21:38 |
| clarkb | read is sequential and randread is random. I think the randread value is more improtant here (at least it was the test that did not great on the existing disk) | 21:39 |
| clarkb | so just rerun the command with the different test selected and it should spit out a similar report | 21:39 |
| clarkb | but both pieces of info are helpful | 21:39 |
| fungi | https://paste.opendev.org/show/bwMpEzmUCJPT7TCZTNXR/ | 21:41 |
| fungi | that's for randread | 21:41 |
| clarkb | that looks very similar to mirror randread `read: IOPS=20.0k, BW=78.2MiB/s (82.0MB/s)(2346MiB/30004msec)` | 21:41 |
| clarkb | so I don't think switching to the performance flavor is going to be any better for us | 21:42 |
| fungi | so maybe ssd and sata performance are similar | 21:42 |
| clarkb | fungi: I ran that on the mirrors root disk not its cache volume fwiw | 21:42 |
| clarkb | in any case that is much much better than the current randread ~1k IOPS number so yes this should be an improvement | 21:42 |
| fungi | ah, so could also be that the rootfs on performance is ssd or has similar performance characteristics | 21:42 |
| clarkb | yup | 21:42 |
| fungi | plan in the etherpad is updated based on your feedback | 21:46 |
| clarkb | what does rsync -S get us here? Just smaller transfers? | 21:50 |
| clarkb | my main concern is that some things may not like that. I know we can't sparse out swap files anymore for example. Wonder if mariadb in particular might have issues with that | 21:50 |
| fungi | oh, old habit, i can drop it and rerun the pre-sync | 21:50 |
| fungi | edited the plan | 21:51 |
| clarkb | ya just thinking if the database is preallocating space for $reasons that might not produce a happy result (I have no evidence this is the case but do know databases do wizardly things with disk stuff) | 21:51 |
| clarkb | I think the plan lgtm now | 21:51 |
| fungi | i want to say ~ancient rsync did not preserve sparseness and the meaning of the -S option changed over time | 21:52 |
| clarkb | my local manpage says `turn sequences of nulls into sparse blocks` | 21:54 |
| clarkb | oh one other thought: Maybe we shutdown everything, then do a mysqldump backup, then shutdown mariadb | 21:54 |
| clarkb | *shutdown everything but mariadb | 21:54 |
| clarkb | that way we have a database backup that doesn't rely on the underlying backing files (in theory since we're moving around on the same fielsystem with no applications running that is safe anyway but belts and suspenders if rsync does something we don't expect) | 21:55 |
| clarkb | I don't know how extra careful we want to be | 21:56 |
| fungi | or maybe we just don't do the final rm of the original directory we moved until we're sure all is well? | 21:59 |
| clarkb | ya that is another good option | 21:59 |
| clarkb | since a mv should be largely transparent if we need to shift things back | 22:00 |
| fungi | yes, the files are being left entirely untouched, as long as we assume rsync doesn't modify the source side in any way, which has always been my understanding | 22:01 |
| clarkb | ya I think we can probably trust rsync there | 22:03 |
| fungi | i made a note | 22:05 |
| fungi | napkin math, taking the rsync slow reads from the rootfs into account, is ~30 minutes for the outage | 22:06 |
| fungi | inbound deliveries should get temporarily queued by exim while mailman is offline, so posts will only be delayed in theory | 22:07 |
| clarkb | and in theory they should retry for like a day or two right? | 22:07 |
| clarkb | but I think if we announce it even if some deliveries fail thats ok | 22:08 |
| fungi | maybe end of this week, friday afternoon my time? say... 20:00 utc? | 22:13 |
| fungi | mail volume tends to be really low then | 22:14 |
| clarkb | I should be around then | 22:14 |
| fungi | i'll go ahead and send something to service announce for that time | 22:18 |
| fungi | sent | 22:30 |
| clarkb | fungi: any idea if I need to moderate it through? I don't see it in my inbox but I also don't see the moderation request either | 22:33 |
| clarkb | maybe I just need to be patient | 22:33 |
| fungi | i received it right away | 22:33 |
| clarkb | I just got it | 22:34 |
| clarkb | so yes patience was required | 22:34 |
| fungi | i blocked out 19:30-21:30 utc on my calendar for it, just in case | 22:36 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!