*** psachin has joined #openstack-swift | 03:35 | |
*** viks___ has joined #openstack-swift | 04:55 | |
*** pcaruana has joined #openstack-swift | 05:29 | |
*** pcaruana has quit IRC | 05:37 | |
*** pcaruana has joined #openstack-swift | 05:50 | |
*** e0ne has joined #openstack-swift | 06:16 | |
*** e0ne has quit IRC | 06:17 | |
openstackgerrit | Matthew Oliver proposed openstack/swift master: sharder: Keep cleaving on empty shard ranges https://review.opendev.org/675820 | 06:34 |
---|---|---|
*** rcernin has quit IRC | 07:13 | |
*** tesseract has joined #openstack-swift | 07:14 | |
*** e0ne has joined #openstack-swift | 07:49 | |
*** onovy has joined #openstack-swift | 08:36 | |
onovy | hi guys. I'm current "maintainer" of swauth. I'm not doing my job :). I tried to fix swauth for Stein: https://review.opendev.org/#/c/670891/ but without success. I'm thinking about discontinue of swauth completly. is anyone interested? | 08:38 |
patchbot | patch 670891 - x/swauth - Fix compatibility with Swift Stein - 4 patch sets | 08:38 |
*** tesseract has quit IRC | 08:42 | |
*** hogepodge has quit IRC | 08:42 | |
*** onovy has quit IRC | 08:42 | |
*** onovy has joined #openstack-swift | 08:42 | |
*** tesseract has joined #openstack-swift | 08:43 | |
*** openstackgerrit has quit IRC | 08:45 | |
*** hogepodge has joined #openstack-swift | 08:47 | |
*** hogepodge has quit IRC | 08:47 | |
*** hogepodge has joined #openstack-swift | 08:47 | |
*** irclogbot_2 has quit IRC | 08:49 | |
*** irclogbot_2 has joined #openstack-swift | 08:53 | |
*** mvkr has joined #openstack-swift | 09:47 | |
*** pcaruana has quit IRC | 10:43 | |
*** pcaruana has joined #openstack-swift | 10:43 | |
*** tdasilva has joined #openstack-swift | 11:24 | |
*** ChanServ sets mode: +v tdasilva | 11:24 | |
*** henriqueof has joined #openstack-swift | 11:42 | |
*** baojg has quit IRC | 12:06 | |
viks___ | clayg: I tried stopping replicator service, and i noticed that, the object server cpu usage comes down to almost zero. I think that means object server process also participates in these replications right? Does this mean, cpu usage should automatically go down after few days or week? | 12:35 |
viks___ | Also currently i do not have any of the below in my object-server.conf under `[app:object-server]`: | 12:35 |
viks___ | ``` | 12:35 |
viks___ | # replication_server = false | 12:35 |
viks___ | # replication_concurrency_per_device = 1 | 12:35 |
viks___ | # replication_lock_timeout = 15 | 12:35 |
viks___ | # replication_failure_threshold = 100 | 12:35 |
viks___ | # replication_failure_ratio = 1.0 | 12:35 |
viks___ | ``` | 12:35 |
viks___ | I have a separate replication network, so do i need to set these? The description of these have something `SSYNC`. So i had left out these as i'm using rsync. | 12:35 |
DHE | rsync/ssync is about getting the bulk files around. swift still signals other object servers that replication has happened so they can upload local indexes which are also used to help judge if a node is out of sync and needs replicating (iirc) | 12:44 |
DHE | the rsync/ssync bulk transfer, and this signaling, runs over the replication network IPs if so configured | 12:44 |
*** openstackgerrit has joined #openstack-swift | 13:02 | |
openstackgerrit | Thiago da Silva proposed openstack/swift master: Allow bulk delete of big SLO manifests https://review.opendev.org/540122 | 13:02 |
*** frickler has quit IRC | 13:18 | |
*** zaitcev has joined #openstack-swift | 13:29 | |
*** ChanServ sets mode: +v zaitcev | 13:29 | |
*** tdasilva has quit IRC | 13:30 | |
*** BjoernT has joined #openstack-swift | 13:59 | |
*** BjoernT_ has joined #openstack-swift | 14:04 | |
*** BjoernT has quit IRC | 14:05 | |
*** tdasilva has joined #openstack-swift | 14:07 | |
*** ChanServ sets mode: +v tdasilva | 14:07 | |
*** donnyd has joined #openstack-swift | 14:53 | |
donnyd | How can I accelerate writes in swift using NVME drives? Is there a mechanism to cache writes in a faster device? | 14:54 |
donnyd | Does this need to be done at a layer below swift? | 14:55 |
donnyd | I guess something like having a hot tier | 15:08 |
tdasilva | donnyd: typically faster drives are used for the account/container layer. I'm not sure I've heard of anyone actually caching writes on a prod. cluster. There was some investigation work done with CAS a few years back, might be worth looking into: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/accelerating-swift-white-paper.pdf | 15:17 |
tdasilva | donnyd: to use at the object layer just becomes really costly for a typical swift cluster, no? | 15:24 |
donnyd | Well it can, but at my scale (very small); I am trying to get better performance if I can | 15:42 |
*** gyee has joined #openstack-swift | 15:54 | |
*** e0ne has quit IRC | 16:01 | |
*** tdasilva has quit IRC | 16:09 | |
*** tdasilva has joined #openstack-swift | 16:10 | |
*** ChanServ sets mode: +v tdasilva | 16:10 | |
donnyd | I was trying to use ZFS to underpin swift because I can accelerate writes and reads from faster media... but that didn't work out so well | 16:15 |
*** zaitcev has quit IRC | 16:25 | |
*** zaitcev has joined #openstack-swift | 16:39 | |
*** ChanServ sets mode: +v zaitcev | 16:39 | |
BjoernT_ | how do I go about deletion of objects inside the container database ? | 17:23 |
*** BjoernT_ is now known as BjoernT | 17:23 | |
BjoernT | delete from object where name like '%c92f64f79f0d1ed01e6d5b314f04886c/008k171b%'; | 17:23 |
BjoernT | Error: no such function: chexor | 17:23 |
BjoernT | the problem is I have again corrupted object names and cant delete them via swift api, not update them as that is not allowed per trigger | 17:24 |
BjoernT | Error: UPDATE not allowed; DELETE and INSERT | 17:24 |
BjoernT | seems like chexor is a function created in memory (swift/common/db.py) when connecting to the database ? | 17:26 |
*** klamath has joined #openstack-swift | 17:33 | |
*** diablo_rojo has joined #openstack-swift | 17:36 | |
*** tdasilva has quit IRC | 17:53 | |
*** tdasilva has joined #openstack-swift | 17:53 | |
*** ChanServ sets mode: +v tdasilva | 17:53 | |
clayg | BjoernT: correct, the function is needed for bookkeeping - maybe get a ContainerBroker object in a repl - and do the sql commands in python? | 18:37 |
clayg | BjoernT: I feel like you almost definately want the updates to the object table to go through merge_items tho... | 18:39 |
clayg | if you could get the list of names and use ContainerBroker.delete_object that might be a *lot* safer | 18:40 |
clayg | ... than doing the sql/like match | 18:40 |
clayg | mattoliverau: ping p 675451 | 18:46 |
patchbot | https://review.opendev.org/#/c/675451/ - swift - Consolidate Container-Update-Override headers - 2 patch sets | 18:46 |
clayg | oh right, I forgot last week I was working on getting symlink versions func tests to "work" with use_symlinks true/false 😞 | 18:46 |
DHE | donnyd: I've considered that, and lvmcache or bcachefs is probably your best bet. lvmcache would require setting up LVM on each disk though | 18:50 |
donnyd | DHE: they don't really work quite the same way as ZFS does though. I think its probably better in this case to worry less about speed and more about reliability | 18:53 |
DHE | ZFS write cache isn't what you probably think it is | 18:53 |
DHE | unless you're dealing with small objects | 18:53 |
BjoernT | clayg I just updated the filename and not deleted antyhing so that I dont have to deal with all the bookkeeping functions. do you have an example on ContainerBroker ? | 19:06 |
clayg | you *updated* the filename? the table really shouldn't allow inplace updates... replication won't be able to propogate anything unless the row/timestamp changes 😬 | 19:21 |
clayg | >>> from swift.container.backend import ContainerBroker | 19:22 |
clayg | >>> b = ContainerBroker('/srv/node4/sdb4/containers/57/576/e7c419a563cd36341b12e9ef22343576/e7c419a563cd36341b12e9ef22343576.db') | 19:22 |
BjoernT | yeah I removed the trigger and added it back as and placed the db at the primary locations, the customer will delete the container anyway | 19:23 |
clayg | ok, sounds like a plan then! | 19:25 |
*** diablo_rojo has quit IRC | 19:25 | |
BjoernT | Yes I saw the structure around ContainerBroker but didnt see methods that help me here at fist glance but put_object is probably it | 19:26 |
clayg | timburke: do you remember what we decided on x-symlink-target-etag and quotes? current patch seems to still do the strip before the check... if I remove the strip one test that was expecting a 409 gets a 400 when it tries to verify sending a the quoted slo etag doesn't work - but that seems fine? | 19:26 |
clayg | BjoernT: any idea how the object names got corrupted? | 19:28 |
*** tesseract has quit IRC | 19:32 | |
timburke | i'm guessing the same way that the timestamps got corrupted in https://bugs.launchpad.net/swift/+bug/1823785 -- some bad bit-flip, potentially causing the name to not even be utf8 any more :-( | 19:33 |
openstack | Launchpad bug 1823785 in OpenStack Object Storage (swift) "Container replicator can propagate corrupt timestamps" [Undecided,New] | 19:33 |
BjoernT | sadly no this is becoming some headache with a growing list of files | 19:35 |
BjoernT | not sure if the ingesting app causes problem or swift | 19:36 |
BjoernT | https://bugs.launchpad.net/swift/+bug/1823785 would be worst case yes, I hope not | 19:36 |
openstack | Launchpad bug 1823785 in OpenStack Object Storage (swift) "Container replicator can propagate corrupt timestamps" [Undecided,New] | 19:36 |
*** psachin has quit IRC | 19:37 | |
donnyd | DHE: I am quite familiar with how ZFS works, and for this case I set the swift dataset to sync=always, which forces writes to the much faster nvme drives. Also for commonly accessed objects, they would be pulling into the arc (think the same happens in any linux FS though). Mainly I was trying to improve write speeds. My disks are pretty slow in comparison with the rest of the equipment I have. I am thinking you | 19:37 |
donnyd | are right though, in object storage the name of the game is reliability. Speed doesn't really matter | 19:37 |
donnyd | In a larger scale system, I really wouldn't even notice.. Its just real slow at my microscopic scale for object storage | 19:40 |
*** zaitcev has quit IRC | 19:43 | |
timburke | clayg, on the quotes thing -- i think as i started to play with it for patchset 20 i saw that test that would break and fixed up my patch to not break the test. up to you to do the strip() or not | 19:44 |
clayg | 🤔 | 19:45 |
clayg | timburke: there's a couple of req modification going on in symlinks _check method currently | 19:48 |
DHE | donnyd: incorrect | 19:49 |
clayg | I can pull out the pop for the etag since that doesn't seem to break anything - but making that method not modify state would require a bit of moving things around and probably wouldn't last... should I rename it? | 19:49 |
donnyd | sure | 19:49 |
donnyd | which part are you thinking is incorrect | 19:50 |
clayg | timburke: well, maybe I could try to get rid of more | 19:50 |
DHE | the NVMe write cache only takes small writes (default 32k or less) and does not speed anything up. if speed matters you use sync=disabled | 19:50 |
DHE | sync=always provides the ultimate in crash protection even if the app didn't ask for it. nothing more. | 19:51 |
donnyd | DHE: So you are telling me that zfs doesn't take all writes that are synchronous and send them to zil -> then to slog if you have one? | 19:53 |
DHE | I'm saying that 1) data doesn't stay on the NVMe disk and 2) ZFS doesn't read back from the ZIL/SLOG in order to write data to the main disks at a later time | 19:54 |
DHE | the ZIL/SLOG is write-only, and only read during crash recovery when mounted | 19:54 |
DHE | so having an nvme disk is better for performance/latency than storing on the main spinning disks, but FORCING data to the nvme disk does not help anything ever | 19:55 |
*** zaitcev has joined #openstack-swift | 19:57 | |
*** ChanServ sets mode: +v zaitcev | 19:57 | |
donnyd | ??? LOL, sure | 20:03 |
clayg | timburke: the _check, _validate, and user->sys is a real mess and the content-type mangling is new, I'm really having a hard time seeing the obvious way to organize it | 20:05 |
* clayg on the docstring for the path string type: | 20:05 | |
clayg | - :returns: a tuple, the full versioned path to the object and the value of | 20:05 |
clayg | - the X-Symlink-Target-Etag header which may be None | 20:05 |
clayg | + :returns: a tuple, the full versioned WSGI quoted path to the object and | 20:05 |
clayg | + the value of the X-Symlink-Target-Etag header which may be None | 20:05 |
clayg | ^ ??? | 20:05 |
*** ccamacho has quit IRC | 20:05 | |
timburke | i'd strike "quoted" -- it isn't, is it? | 20:10 |
*** zaitcev has quit IRC | 20:17 | |
*** e0ne has joined #openstack-swift | 20:26 | |
*** zaitcev has joined #openstack-swift | 20:30 | |
*** ChanServ sets mode: +v zaitcev | 20:31 | |
*** tdasilva has quit IRC | 20:37 | |
*** zaitcev_ has joined #openstack-swift | 20:39 | |
*** ChanServ sets mode: +v zaitcev_ | 20:39 | |
donnyd | DHE: So its possible my testing is completely flawed, but I want to share some data and so I can try to understand | 20:41 |
*** zaitcev has quit IRC | 20:43 | |
donnyd | WRITE: bw=801MiB/s (840MB/s), 200MiB/s-322MiB/s (210MB/s-337MB/s), io=16.0GiB (17.2GB) sync=always | 20:45 |
donnyd | WRITE: bw=321MiB/s (336MB/s), 80.2MiB/s-135MiB/s (84.1MB/s-142MB/s), io=16.0GiB (17.2GB) sync=standard | 20:45 |
*** pcaruana has quit IRC | 20:56 | |
DHE | donnyd: that's strange. only possibility that makes sense to me is if you're buffering each TCP packet rather than doing a huge fsync() when it's done... | 21:21 |
DHE | which means that sync=disabled would be even faster, though without the crash safety | 21:22 |
donnyd | Ok, that makes sense. In reality I think I am going to follow the advice I got earlier and just make something as stable as possible and not worry about it | 21:22 |
donnyd | It already blew up with zfs once. | 21:23 |
donnyd | So should I be using any sort of sw raid or just individual disks | 21:24 |
DHE | the theory behind swift is that the redundancy is already handled with swift itself, so you're better off getting the better IOPS by allowing each disk to operate independently | 21:24 |
donnyd | So should I put the 11 drives I have in raid(x) or should I just leave them in jbod and use swift | 21:25 |
DHE | assuming large-ish objects, striping of RAID disks tends to make all disks seek in unison | 21:25 |
*** tdasilva has joined #openstack-swift | 21:25 | |
*** ChanServ sets mode: +v tdasilva | 21:25 | |
*** tdasilva has quit IRC | 21:25 | |
DHE | whereas with swift the request is served by 1 spindle, which is good for multi-object performance but bad for throughput on a single object | 21:25 |
DHE | ZFS is especially bad because RAID-Z has a stripe chunk size of 1 disk sector (4k tops typically) | 21:26 |
donnyd | I will have log files, glance images, and a random assortment of desktopy files | 21:26 |
donnyd | so maybe a few raid0 groups? | 21:28 |
DHE | I still think you're best off just having individual disks, unless you really need the throughput that comes with raid-0 or another striped raid | 21:30 |
donnyd | That makes sense | 21:39 |
donnyd | Do you think it would be worthwhile to maybe do something like external journals for ext4 or xfs? | 21:40 |
DHE | it could be worth it. anything that keeps seeking down on writes I suppose... | 21:48 |
DHE | personally I want to give lvmcache a spin, but dont' have a high endurance SSD to enable writeback mode | 21:48 |
*** e0ne has quit IRC | 21:49 | |
*** diablo_rojo has joined #openstack-swift | 22:03 | |
*** henriqueof has quit IRC | 22:04 | |
*** BjoernT has quit IRC | 22:07 | |
clayg | timburke: so POST to hardlink will still 307 despite the etag not validating | 22:21 |
timburke | sounds right | 22:22 |
timburke | or at any rate, expected | 22:22 |
timburke | i mean -- we *could* do the POST, then validate *after* and decide whether to 307 or 409... but i don't think we *must* apply the metadata -- eventual consistency's gonna get weird otherwise | 22:25 |
clayg | don't think we must? (probably typo, cause yeah ... we have to) | 22:27 |
clayg | so, but I'm not even sure if we know enough to return the 409 ... we could go and *check* 😬 | 22:27 |
timburke | yeah, typo -- i confused myself rewriting what was a double-negative | 22:28 |
clayg | timburke: can you think of any prior art on new features annoucing themselves in /info | 22:28 |
clayg | I think it's a great idea - i'm just not sure to call it "allowed" - would love to look a diff that exposed /info on a non-configurable feature before? | 22:29 |
timburke | and yeah, my thought was that we could go check -- but that it'd have to be after we sent the POST and had an indication that we'd just POSTed to a hardlink | 22:29 |
timburke | on the /info thing, i don't think we've really got precedent. but it kinda sucks that clients have to know that data-segments were added to SLO in 2.17.0 | 22:31 |
clayg | yes, totally agree! it'd be a great habit to get into. | 22:31 |
*** diablo_rojo has quit IRC | 22:32 | |
clayg | but allowed/enabled sounds too much like it invites being turned off to me 🤔 | 22:32 |
clayg | available? | 22:33 |
timburke | maybe "supports_static_links"? available's OK by me, too, though | 22:34 |
clayg | maybe w/o prior art I'll ask if we can defer it to a follow-up change and maybe also do an audit of other features that deserve similar treatment? | 22:34 |
clayg | would it be ok to defer it? I could put up a placeholder patch and we could talk about it at the meeting? | 22:35 |
timburke | 👍 | 22:36 |
clayg | Maybe we can say static link 307 on POST makes sense because that verb doesn't supported x-if-match symantic and probably couldn't 🤔 | 22:37 |
timburke | mainly just a thought -- i feel like there's some window of diminishing returns -- in fact, at a year and a half, the data segments thing is maybe approaching the end of that window | 22:37 |
clayg | that's probably true, the fresher the feature the more clients need to assume their favorite cluster's don't have it... | 22:39 |
timburke | i think the 307's pretty fair -- i was just noticing that we tell the client to go try again elsewhere without providing any of the context about it being a hardlink | 22:39 |
*** hoonetorg has quit IRC | 22:48 | |
*** hoonetorg has joined #openstack-swift | 22:50 | |
clayg | so it looks like I could throw an `x-symlink-target-etag` in the 307 response? That might be a little useful? | 22:50 |
*** hoonetorg has quit IRC | 22:57 | |
*** hoonetorg has joined #openstack-swift | 23:01 | |
openstackgerrit | Clay Gerrard proposed openstack/swift master: Allow "static symlinks" https://review.opendev.org/633094 | 23:03 |
*** rcernin has joined #openstack-swift | 23:11 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!