*** gyee has quit IRC | 01:04 | |
*** rcernin has quit IRC | 02:26 | |
*** rcernin has joined #openstack-swift | 02:49 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #openstack-swift | 04:33 | |
*** mikecmpbll has quit IRC | 05:15 | |
*** mikecmpbll has joined #openstack-swift | 05:18 | |
*** dsariel has joined #openstack-swift | 05:56 | |
*** rpittau|afk is now known as rpittau | 07:22 | |
*** rcernin has quit IRC | 07:35 | |
*** rcernin_ has joined #openstack-swift | 07:35 | |
*** rcernin_ has quit IRC | 07:42 | |
*** MooingLemur has quit IRC | 07:45 | |
*** MooingLemur has joined #openstack-swift | 07:46 | |
*** mikecmpbll has quit IRC | 08:10 | |
*** tkajinam is now known as tkajinam|away | 09:17 | |
*** tkajinam|away is now known as tkajinam | 09:17 | |
*** rcernin_ has joined #openstack-swift | 09:56 | |
*** rcernin_ has quit IRC | 10:25 | |
*** rcernin_ has joined #openstack-swift | 10:42 | |
*** rcernin_ has quit IRC | 11:06 | |
*** fingo has quit IRC | 11:25 | |
*** rcernin_ has joined #openstack-swift | 11:59 | |
*** tkajinam has quit IRC | 14:20 | |
*** rcernin_ has quit IRC | 14:27 | |
*** dsariel has quit IRC | 15:11 | |
*** gyee has joined #openstack-swift | 15:39 | |
*** rpittau is now known as rpittau|afk | 15:52 | |
ormandj | is there a backport of https://opendev.org/openstack/swift/commit/754defc39c0ffd7d68c9913d4da1e38c503bf914 to ussuri? | 16:28 |
---|---|---|
ormandj | with victoria being 20.04 only, and that being a critical issue for us, we're hoping it's possible ;) | 16:29 |
timburke | ormandj, not yet. i haven't checked how cleanly it would apply, but i can look into it. fwiw, though, i wholely expect victoria swift to work on older versions of ubuntu, and to play well with an otherwise-ussuri openstack install | 16:32 |
ormandj | timburke: i think ubuntu cloud archive is only building for >=20.04 | 16:40 |
ormandj | we're working on getting that all together because some of the fixes in victoria are pretty huge for the big ticket issues we have | 16:40 |
ormandj | but it's not an overnight process / | 16:40 |
timburke | huh. http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/focal-updates/victoria/main/binary-amd64/Packages lists swift 2.25.1... though 2.26.0 is in focal-proposed, so i guess everything's on-track | 16:52 |
timburke | i don't see any differences looking at the package dependencies (which makes sense; we make a point of not bumping deps unnecessarily) so i think you might be good to just pull down the victoria swift packages and install them on bionic | 16:54 |
timburke | (fair warning: i've never tried it. indeed, i usually don't use distro packages at all -- i'm usually working from source, being a dev and all, and when we need to upgrade swift on our clusters, we build our own packages) | 16:56 |
ormandj | timburke: yeah, we'll try to figure it out | 17:11 |
ormandj | second one, testing a mass rebalance on dev nodes, some data has gone 404. using swift-get-nodes to get the location then checking each of the primary/handoffs, the data isn't there | 17:12 |
ormandj | by mass rebalance i mean adding a new node into the ring and putting weight at 100% effectively at once | 17:12 |
ormandj | is that expected due to partition location changes and self-rectifying as rebalancing completes? | 17:12 |
ormandj | i didn't expect that, and i haven't used swift-get-nodes on the old ring files to see if the 'old' data still exists | 17:13 |
ormandj | but we definitely serve 404s for data that was there now | 17:13 |
timburke | ormandj, triple replica, right? how quickly did you rebalance the ring? ever since https://github.com/openstack/swift/commit/ce26e789 only one assignment should change per rebalance, so i would've expected the other two primary locations to still have it... | 17:20 |
timburke | how quickly *and how many times* | 17:20 |
ormandj | yes, triple replica | 17:22 |
timburke | you had three beefy nodes before, right? is the new one roughly the same size as the others, or considerably larger? | 17:23 |
ormandj | larger | 17:23 |
ormandj | one ring change to add it, at full capacity, then a month later, one more ring change, then a week alter one more | 17:24 |
timburke | that seems perfectly reasonable -- do we know whether the object was still accessible at the intermediary stages? | 17:25 |
ormandj | no, 404ing almost at the very beginning | 17:26 |
ormandj | unfortunately don't have logs going back far enough to determine if a DELETE went through | 17:26 |
timburke | i was just about to ask about a sanity check there :-) | 17:27 |
ormandj | yeah, it's still in the container db | 17:27 |
ormandj | but it _is_ possible a DELETE went through for it, and the container db just didn't get the update | 17:27 |
ormandj | but i'd expect that to eventually have caught up, too | 17:27 |
ormandj | if all the data itself is actually purged | 17:27 |
timburke | you have this habit of answering my next question before i ask it :P | 17:27 |
ormandj | i just looked at the old rings (backups) and the locations the data 'should' be, checked those locations, it's definitely not there | 17:28 |
ormandj | based on the old rings | 17:28 |
ormandj | it's actually the same location as the 'new' rings show it should exist | 17:28 |
timburke | when you deliver new rings, do you have a feel for how long the gap is between the first node getting the new ring and the last one getting it? rledisez had a ~30min window that led him to observe https://bugs.launchpad.net/swift/+bug/1897177 | 17:30 |
openstack | Launchpad bug 1897177 in OpenStack Object Storage (swift) "Race condition in replication/reconstruction can lead to loss of datafile" [High,In progress] | 17:30 |
ormandj | about 5 seconds | 17:30 |
timburke | yeah, negligible. good | 17:30 |
timburke | check quarantine dirs? | 17:31 |
ormandj | hm, protip on doing that? | 17:31 |
ormandj | i did check the handoff nodes fwiw | 17:32 |
ormandj | not seeing a quarentined dir in the /srv/node/driveID | 17:34 |
timburke | hrm. yeah, i would've checked with something like `find /srv/node*/*/quarantined` | 17:35 |
ormandj | yeah, no such directory | 17:36 |
ormandj | big async_pending | 17:36 |
timburke | might have a delete record to send to the container | 17:40 |
timburke | how big is the container? | 17:40 |
ormandj | i'm sure huge | 17:41 |
ormandj | is there a way to look for a delete record pending? | 17:41 |
ormandj | we want to make sure this is a result of a client operation | 17:41 |
ormandj | not the server(s) | 17:41 |
ormandj | but we don't have client logs going back 2 months | 17:41 |
timburke | each file is just a pickled dict iirc -- https://review.opendev.org/#/c/725429/1/swift/cli/async_inspector.py almost seems too simple for me to bother pushing on ;-) | 17:43 |
patchbot | patch 725429 - swift - Add a tool to peek inside async_pendings - 1 patch set | 17:43 |
timburke | how's your reclaim age? might want to bump it up if we're worried about how big the async pile is getting... | 17:44 |
timburke | https://gist.github.com/clayg/249c5d3ff580032a0d40751fc3f9f24b may be useful (both to get a feel for the state of the system, and as a starting point to go looking for a specific async) | 17:47 |
timburke | though given the suffix/hash from swift-get-nodes, it probably wouldn't be so hard to find for it anyway... | 17:49 |
timburke | something like `find /srv/node*/*/async*/${SUFFIX}/${HASH}*` | 17:50 |
cwright | timburke: reclaim_age is set to 2592000 | 17:51 |
timburke | then, assuming you find something, crack it open and make sure it really was for a delete | 17:51 |
timburke | cwright, so 30 days -- that might not be long enough, if we're legit worried that this was deleted a couple months ago and never made it back to the container server... | 17:52 |
*** whmcr has joined #openstack-swift | 17:55 | |
timburke | :-/ i should help clayg make that async stats script work on py3 | 17:56 |
whmcr | @timburke we're assuming that suffix & hash are the parts of the filepath that are post the partition. IE /srv/node/DRIVEID/objects/PARTITIONID/SUFFIX/HASH, if so no dice on that | 18:03 |
timburke | yup, that was the idea | 18:04 |
timburke | :-( | 18:04 |
*** djhankb has quit IRC | 18:07 | |
timburke | ok, so https://gist.github.com/tipabu/abf38940d49d67d33fe98b957f9306a6 should work on py3 and i taught it to report the age of the oldest async it can find | 18:17 |
whmcr | running it now | 18:20 |
whmcr | count is already >70k, oldest (so far at least) is from July | 18:21 |
timburke | 😳 | 18:23 |
*** dsariel has joined #openstack-swift | 18:23 | |
timburke | 70k may or may not be something to worry about, but we definitely need to bring that reclaim age up | 18:23 |
timburke | at whatever point the updater gets around to that async from July, it's not making any container requests; it's just going to delete it | 18:24 |
timburke | how's the updater tuned? in particular, what've you got for workers, concurrency, objects_per_second? | 18:25 |
timburke | probably also want to get a feel for your success rate for processing updates | 18:29 |
cwright | timburke: those three settings are still using defaults: concurrency = 8, updater_workers = 1, objects_per_second = 50 | 18:30 |
timburke | my gut says you probably want to turn up workers, considering how dense your chassis are | 18:33 |
timburke | ormandj, just had a thought, since you mentioned still having backups of the old rings. have you compared the results you get from swift-get-nodes between them? i would expect there to be at least one disk assignment that didn't change, though i suppose i could be wrong... | 18:36 |
whmcr | yup, we've done that against pre-adding the node | 18:36 |
whmcr | all match up | 18:36 |
timburke | wait, so *none* of the assignments changed with the rebalance? this seems increasingly like it *must've* been deleted, but who knows when... | 18:37 |
timburke | fwiw, one of the tricks we've got for server logs is to push them back into the cluster as part of log rotation, under a .logs account that only reseller admins can access | 18:40 |
*** djhankb has joined #openstack-swift | 18:40 | |
timburke | might want to check recon cache, looking for object_updater_sweep time | 18:43 |
whmcr | sorry, the drive asignments do change, but the files are not there on any of the versions we've checked | 18:44 |
openstackgerrit | Tim Burke proposed openstack/python-swiftclient master: Remove some py38 job cruft https://review.opendev.org/758479 | 18:47 |
timburke | whmcr, did all of the assignments change? just one? two? | 18:48 |
whmcr | looks like one of the non [Handoff] ones changes, and then all of the [handoff]'s change | 18:49 |
timburke | so the primaries (non-handoffs) that *didn't* change should be pretty authoritative -- if they don't have it (either in objects/PARTITIONID/SUFFIX/HASH or quarantined/objects/HASH) it was most likely deleted | 19:02 |
timburke | when you found it in listings, was that from just one replica of the container DB, or looking across all of them? | 19:03 |
whmcr | listing was from an s3 client doing a GET on the container for an object listing | 19:04 |
timburke | might be worth doing direct listings to each container server with limit=1 and prefix=<object>, see if they seem to agree that it should exist | 19:08 |
timburke | or even drop into sqlite3 and query for it directly. if you go that route, note that you'll probably want to include a 'AND deleted in (0, 1)' clause to take advantage of the deleted,name index | 19:09 |
timburke | but then you can also see the tombstone row (if it exists) | 19:10 |
openstackgerrit | Tim Burke proposed openstack/swift master: Clarify some object-updater settings https://review.opendev.org/758488 | 19:23 |
*** gregwork has quit IRC | 19:26 | |
ormandj | timburke: for clarity, reclaim_age being too short when asyncs haven't updated container.db means data that _should_ be purged, won't be, if the async container update doesn't go through prior? | 20:00 |
ormandj | ie: dark data will be left on system that shouldn't be there | 20:01 |
timburke | so reclaim_age can bite you two ways at the object-layer if it's too short: you might reap some object-server tombstones (*.ts files) before all of the *.data have had a chance to get cleaned up, leading to dark data -- OR you might give up on ever getting an async pending through to the container layer, leading to either dark data (if the async was for a PUT) or ghost listings (if the async was for a DELETE) | 20:06 |
ormandj | copy. we'll set it really large then until all this is caught up ;) the key is makings ure it wouldn't result in objects going missing that shouldn't be | 20:06 |
timburke | at the container layer, a too-short reclaim age pretty much always leads to ghost listings, where one replica goes offline for a while then comes back and syncs with other copies that had & reclaimed a deleted row for some of the objects | 20:07 |
ormandj | we'll get that out of the way first, then crank up the updater workers, some of these container.dbs are showing an update time from july | 20:07 |
ormandj | with lots of asyncs pending for them | 20:08 |
timburke | nope -- having it too high just means you're using up some inodes "unnecessarily" -- i'd definitely err on the side of too high rather than too low | 20:08 |
ormandj | perfect | 20:08 |
ormandj | timburke: updating the worker count, anything else we can do to push these asyncs through? | 20:17 |
ormandj | containerdbs are on ssds | 20:17 |
timburke | ormandj, might check to see if you've got https://review.opendev.org/#/c/741753/ in your swift -- if not, you can kick up your container replicator interval to like 48hrs or something until asyncs settle down | 20:21 |
patchbot | patch 741753 - swift (stable/ussuri) - Breakup reclaim into batches (MERGED) - 1 patch set | 20:21 |
ormandj | looking | 20:23 |
ormandj | timburke: unfortunately, i don't think that's in the ussuri cloud packages we have | 20:25 |
ormandj | don't see the other_stuff function in the db.py | 20:25 |
timburke | the fix should be in 2.26.0, 2.25.1, and 2.23.2 | 20:29 |
timburke | again, you can work around it by temporarily prolonging your container-replicator cycle time -- it's just a thing we've seen where the replicator may hold a long lock while reclaiming deleted rows | 20:31 |
ormandj | yeah, those releases didn't get built in ubuntu cloud archive | 20:36 |
ormandj | 2.25.1 that is | 20:36 |
ormandj | just checked, latest is still 2.25.0 | 20:36 |
ormandj | we'll set the replication interval to 48 hours for the container replicator, update the updater_workers to 4, and set reclaim_age to 120 days | 20:37 |
ormandj | we're hoping that's enough to much on these asyncs, we are way behind in this cluster | 20:39 |
ormandj | last-modified on the container db is something like july 06 heh on this one container | 20:39 |
ormandj | we stopped that script and it was over 3 million asyncs | 20:40 |
timburke | certainly a bunch, but with ssds and a tuned-down container-replicator it should be quite manageable. you've got this! | 20:44 |
openstackgerrit | Tim Burke proposed openstack/python-swiftclient master: Allow tempurl times to have units https://review.opendev.org/758500 | 21:10 |
klamath_atx | @timburke I upgraded our lab, the only weirdness im seeing right now is container-reconciler is having issues connecting to remote memcache servers, is that a know upgrade issue? | 21:26 |
timburke | klamath_atx, i've not seen that before :-/ | 21:29 |
klamath_atx | gotcha, just wanted to check in before i start spinning wheels | 21:33 |
mattoliverau | morning | 21:52 |
*** rcernin_ has joined #openstack-swift | 22:03 | |
openstackgerrit | Tim Burke proposed openstack/swift master: Optimize swift-recon-cron a bit https://review.opendev.org/758505 | 22:06 |
*** rcernin_ has quit IRC | 22:19 | |
openstackgerrit | Merged openstack/python-swiftclient master: Close connections created when calling module-level functions https://review.opendev.org/721051 | 22:41 |
*** tkajinam has joined #openstack-swift | 22:59 | |
zaitcev | "Firefox can’t establish a connection to the server at review.opendev.org." | 23:58 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!