Monday, 2019-08-12

*** psachin has joined #openstack-swift		03:35
*** viks___ has joined #openstack-swift		04:55
*** pcaruana has joined #openstack-swift		05:29
*** pcaruana has quit IRC		05:37
*** pcaruana has joined #openstack-swift		05:50
*** e0ne has joined #openstack-swift		06:16
*** e0ne has quit IRC		06:17
openstackgerrit	Matthew Oliver proposed openstack/swift master: sharder: Keep cleaving on empty shard ranges https://review.opendev.org/675820	06:34
*** rcernin has quit IRC		07:13
*** tesseract has joined #openstack-swift		07:14
*** e0ne has joined #openstack-swift		07:49
*** onovy has joined #openstack-swift		08:36
onovy	hi guys. I'm current "maintainer" of swauth. I'm not doing my job :). I tried to fix swauth for Stein: https://review.opendev.org/#/c/670891/ but without success. I'm thinking about discontinue of swauth completly. is anyone interested?	08:38
patchbot	patch 670891 - x/swauth - Fix compatibility with Swift Stein - 4 patch sets	08:38
*** tesseract has quit IRC		08:42
*** hogepodge has quit IRC		08:42
*** onovy has quit IRC		08:42
*** onovy has joined #openstack-swift		08:42
*** tesseract has joined #openstack-swift		08:43
*** openstackgerrit has quit IRC		08:45
*** hogepodge has joined #openstack-swift		08:47
*** hogepodge has quit IRC		08:47
*** hogepodge has joined #openstack-swift		08:47
*** irclogbot_2 has quit IRC		08:49
*** irclogbot_2 has joined #openstack-swift		08:53
*** mvkr has joined #openstack-swift		09:47
*** pcaruana has quit IRC		10:43
*** pcaruana has joined #openstack-swift		10:43
*** tdasilva has joined #openstack-swift		11:24
*** ChanServ sets mode: +v tdasilva		11:24
*** henriqueof has joined #openstack-swift		11:42
*** baojg has quit IRC		12:06
viks___	clayg: I tried stopping replicator service, and i noticed that, the object server cpu usage comes down to almost zero. I think that means object server process also participates in these replications right? Does this mean, cpu usage should automatically go down after few days or week?	12:35
viks___	Also currently i do not have any of the below in my object-server.conf under `[app:object-server]`:	12:35
viks___	```	12:35
viks___	# replication_server = false	12:35
viks___	# replication_concurrency_per_device = 1	12:35
viks___	# replication_lock_timeout = 15	12:35
viks___	# replication_failure_threshold = 100	12:35
viks___	# replication_failure_ratio = 1.0	12:35
viks___	```	12:35
viks___	I have a separate replication network, so do i need to set these? The description of these have something `SSYNC`. So i had left out these as i'm using rsync.	12:35
DHE	rsync/ssync is about getting the bulk files around. swift still signals other object servers that replication has happened so they can upload local indexes which are also used to help judge if a node is out of sync and needs replicating (iirc)	12:44
DHE	the rsync/ssync bulk transfer, and this signaling, runs over the replication network IPs if so configured	12:44
*** openstackgerrit has joined #openstack-swift		13:02
openstackgerrit	Thiago da Silva proposed openstack/swift master: Allow bulk delete of big SLO manifests https://review.opendev.org/540122	13:02
*** frickler has quit IRC		13:18
*** zaitcev has joined #openstack-swift		13:29
*** ChanServ sets mode: +v zaitcev		13:29
*** tdasilva has quit IRC		13:30
*** BjoernT has joined #openstack-swift		13:59
*** BjoernT_ has joined #openstack-swift		14:04
*** BjoernT has quit IRC		14:05
*** tdasilva has joined #openstack-swift		14:07
*** ChanServ sets mode: +v tdasilva		14:07
*** donnyd has joined #openstack-swift		14:53
donnyd	How can I accelerate writes in swift using NVME drives? Is there a mechanism to cache writes in a faster device?	14:54
donnyd	Does this need to be done at a layer below swift?	14:55
donnyd	I guess something like having a hot tier	15:08
tdasilva	donnyd: typically faster drives are used for the account/container layer. I'm not sure I've heard of anyone actually caching writes on a prod. cluster. There was some investigation work done with CAS a few years back, might be worth looking into: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/accelerating-swift-white-paper.pdf	15:17
tdasilva	donnyd: to use at the object layer just becomes really costly for a typical swift cluster, no?	15:24
donnyd	Well it can, but at my scale (very small); I am trying to get better performance if I can	15:42
*** gyee has joined #openstack-swift		15:54
*** e0ne has quit IRC		16:01
*** tdasilva has quit IRC		16:09
*** tdasilva has joined #openstack-swift		16:10
*** ChanServ sets mode: +v tdasilva		16:10
donnyd	I was trying to use ZFS to underpin swift because I can accelerate writes and reads from faster media... but that didn't work out so well	16:15
*** zaitcev has quit IRC		16:25
*** zaitcev has joined #openstack-swift		16:39
*** ChanServ sets mode: +v zaitcev		16:39
BjoernT_	how do I go about deletion of objects inside the container database ?	17:23
*** BjoernT_ is now known as BjoernT		17:23
BjoernT	delete from object where name like '%c92f64f79f0d1ed01e6d5b314f04886c/008k171b%';	17:23
BjoernT	Error: no such function: chexor	17:23
BjoernT	the problem is I have again corrupted object names and cant delete them via swift api, not update them as that is not allowed per trigger	17:24
BjoernT	Error: UPDATE not allowed; DELETE and INSERT	17:24
BjoernT	seems like chexor is a function created in memory (swift/common/db.py) when connecting to the database ?	17:26
*** klamath has joined #openstack-swift		17:33
*** diablo_rojo has joined #openstack-swift		17:36
*** tdasilva has quit IRC		17:53
*** tdasilva has joined #openstack-swift		17:53
*** ChanServ sets mode: +v tdasilva		17:53
clayg	BjoernT: correct, the function is needed for bookkeeping - maybe get a ContainerBroker object in a repl - and do the sql commands in python?	18:37
clayg	BjoernT: I feel like you almost definately want the updates to the object table to go through merge_items tho...	18:39
clayg	if you could get the list of names and use ContainerBroker.delete_object that might be a lot safer	18:40
clayg	... than doing the sql/like match	18:40
clayg	mattoliverau: ping p 675451	18:46
patchbot	https://review.opendev.org/#/c/675451/ - swift - Consolidate Container-Update-Override headers - 2 patch sets	18:46
clayg	oh right, I forgot last week I was working on getting symlink versions func tests to "work" with use_symlinks true/false 😞	18:46
DHE	donnyd: I've considered that, and lvmcache or bcachefs is probably your best bet. lvmcache would require setting up LVM on each disk though	18:50
donnyd	DHE: they don't really work quite the same way as ZFS does though. I think its probably better in this case to worry less about speed and more about reliability	18:53
DHE	ZFS write cache isn't what you probably think it is	18:53
DHE	unless you're dealing with small objects	18:53
BjoernT	clayg I just updated the filename and not deleted antyhing so that I dont have to deal with all the bookkeeping functions. do you have an example on ContainerBroker ?	19:06
clayg	you updated the filename? the table really shouldn't allow inplace updates... replication won't be able to propogate anything unless the row/timestamp changes 😬	19:21
clayg	>>> from swift.container.backend import ContainerBroker	19:22
clayg	>>> b = ContainerBroker('/srv/node4/sdb4/containers/57/576/e7c419a563cd36341b12e9ef22343576/e7c419a563cd36341b12e9ef22343576.db')	19:22
BjoernT	yeah I removed the trigger and added it back as and placed the db at the primary locations, the customer will delete the container anyway	19:23
clayg	ok, sounds like a plan then!	19:25
*** diablo_rojo has quit IRC		19:25
BjoernT	Yes I saw the structure around ContainerBroker but didnt see methods that help me here at fist glance but put_object is probably it	19:26
clayg	timburke: do you remember what we decided on x-symlink-target-etag and quotes? current patch seems to still do the strip before the check... if I remove the strip one test that was expecting a 409 gets a 400 when it tries to verify sending a the quoted slo etag doesn't work - but that seems fine?	19:26
clayg	BjoernT: any idea how the object names got corrupted?	19:28
*** tesseract has quit IRC		19:32
timburke	i'm guessing the same way that the timestamps got corrupted in https://bugs.launchpad.net/swift/+bug/1823785 -- some bad bit-flip, potentially causing the name to not even be utf8 any more :-(	19:33
openstack	Launchpad bug 1823785 in OpenStack Object Storage (swift) "Container replicator can propagate corrupt timestamps" [Undecided,New]	19:33
BjoernT	sadly no this is becoming some headache with a growing list of files	19:35
BjoernT	not sure if the ingesting app causes problem or swift	19:36
BjoernT	https://bugs.launchpad.net/swift/+bug/1823785 would be worst case yes, I hope not	19:36
openstack	Launchpad bug 1823785 in OpenStack Object Storage (swift) "Container replicator can propagate corrupt timestamps" [Undecided,New]	19:36
*** psachin has quit IRC		19:37
donnyd	DHE: I am quite familiar with how ZFS works, and for this case I set the swift dataset to sync=always, which forces writes to the much faster nvme drives. Also for commonly accessed objects, they would be pulling into the arc (think the same happens in any linux FS though). Mainly I was trying to improve write speeds. My disks are pretty slow in comparison with the rest of the equipment I have. I am thinking you	19:37
donnyd	are right though, in object storage the name of the game is reliability. Speed doesn't really matter	19:37
donnyd	In a larger scale system, I really wouldn't even notice.. Its just real slow at my microscopic scale for object storage	19:40
*** zaitcev has quit IRC		19:43
timburke	clayg, on the quotes thing -- i think as i started to play with it for patchset 20 i saw that test that would break and fixed up my patch to not break the test. up to you to do the strip() or not	19:44
clayg	🤔	19:45
clayg	timburke: there's a couple of req modification going on in symlinks _check method currently	19:48
DHE	donnyd: incorrect	19:49
clayg	I can pull out the pop for the etag since that doesn't seem to break anything - but making that method not modify state would require a bit of moving things around and probably wouldn't last... should I rename it?	19:49
donnyd	sure	19:49
donnyd	which part are you thinking is incorrect	19:50
clayg	timburke: well, maybe I could try to get rid of more	19:50
DHE	the NVMe write cache only takes small writes (default 32k or less) and does not speed anything up. if speed matters you use sync=disabled	19:50
DHE	sync=always provides the ultimate in crash protection even if the app didn't ask for it. nothing more.	19:51
donnyd	DHE: So you are telling me that zfs doesn't take all writes that are synchronous and send them to zil -> then to slog if you have one?	19:53
DHE	I'm saying that 1) data doesn't stay on the NVMe disk and 2) ZFS doesn't read back from the ZIL/SLOG in order to write data to the main disks at a later time	19:54
DHE	the ZIL/SLOG is write-only, and only read during crash recovery when mounted	19:54
DHE	so having an nvme disk is better for performance/latency than storing on the main spinning disks, but FORCING data to the nvme disk does not help anything ever	19:55
*** zaitcev has joined #openstack-swift		19:57
*** ChanServ sets mode: +v zaitcev		19:57
donnyd	??? LOL, sure	20:03
clayg	timburke: the _check, _validate, and user->sys is a real mess and the content-type mangling is new, I'm really having a hard time seeing the obvious way to organize it	20:05
* clayg on the docstring for the path string type:		20:05
clayg	- :returns: a tuple, the full versioned path to the object and the value of	20:05
clayg	- the X-Symlink-Target-Etag header which may be None	20:05
clayg	+ :returns: a tuple, the full versioned WSGI quoted path to the object and	20:05
clayg	+ the value of the X-Symlink-Target-Etag header which may be None	20:05
clayg	^ ???	20:05
*** ccamacho has quit IRC		20:05
timburke	i'd strike "quoted" -- it isn't, is it?	20:10
*** zaitcev has quit IRC		20:17
*** e0ne has joined #openstack-swift		20:26
*** zaitcev has joined #openstack-swift		20:30
*** ChanServ sets mode: +v zaitcev		20:31
*** tdasilva has quit IRC		20:37
*** zaitcev_ has joined #openstack-swift		20:39
*** ChanServ sets mode: +v zaitcev_		20:39
donnyd	DHE: So its possible my testing is completely flawed, but I want to share some data and so I can try to understand	20:41
*** zaitcev has quit IRC		20:43
donnyd	WRITE: bw=801MiB/s (840MB/s), 200MiB/s-322MiB/s (210MB/s-337MB/s), io=16.0GiB (17.2GB) sync=always	20:45
donnyd	WRITE: bw=321MiB/s (336MB/s), 80.2MiB/s-135MiB/s (84.1MB/s-142MB/s), io=16.0GiB (17.2GB) sync=standard	20:45
*** pcaruana has quit IRC		20:56
DHE	donnyd: that's strange. only possibility that makes sense to me is if you're buffering each TCP packet rather than doing a huge fsync() when it's done...	21:21
DHE	which means that sync=disabled would be even faster, though without the crash safety	21:22
donnyd	Ok, that makes sense. In reality I think I am going to follow the advice I got earlier and just make something as stable as possible and not worry about it	21:22
donnyd	It already blew up with zfs once.	21:23
donnyd	So should I be using any sort of sw raid or just individual disks	21:24
DHE	the theory behind swift is that the redundancy is already handled with swift itself, so you're better off getting the better IOPS by allowing each disk to operate independently	21:24
donnyd	So should I put the 11 drives I have in raid(x) or should I just leave them in jbod and use swift	21:25
DHE	assuming large-ish objects, striping of RAID disks tends to make all disks seek in unison	21:25
*** tdasilva has joined #openstack-swift		21:25
*** ChanServ sets mode: +v tdasilva		21:25
*** tdasilva has quit IRC		21:25
DHE	whereas with swift the request is served by 1 spindle, which is good for multi-object performance but bad for throughput on a single object	21:25
DHE	ZFS is especially bad because RAID-Z has a stripe chunk size of 1 disk sector (4k tops typically)	21:26
donnyd	I will have log files, glance images, and a random assortment of desktopy files	21:26
donnyd	so maybe a few raid0 groups?	21:28
DHE	I still think you're best off just having individual disks, unless you really need the throughput that comes with raid-0 or another striped raid	21:30
donnyd	That makes sense	21:39
donnyd	Do you think it would be worthwhile to maybe do something like external journals for ext4 or xfs?	21:40
DHE	it could be worth it. anything that keeps seeking down on writes I suppose...	21:48
DHE	personally I want to give lvmcache a spin, but dont' have a high endurance SSD to enable writeback mode	21:48
*** e0ne has quit IRC		21:49
*** diablo_rojo has joined #openstack-swift		22:03
*** henriqueof has quit IRC		22:04
*** BjoernT has quit IRC		22:07
clayg	timburke: so POST to hardlink will still 307 despite the etag not validating	22:21
timburke	sounds right	22:22
timburke	or at any rate, expected	22:22
timburke	i mean -- we could do the POST, then validate after and decide whether to 307 or 409... but i don't think we must apply the metadata -- eventual consistency's gonna get weird otherwise	22:25
clayg	don't think we must? (probably typo, cause yeah ... we have to)	22:27
clayg	so, but I'm not even sure if we know enough to return the 409 ... we could go and check 😬	22:27
timburke	yeah, typo -- i confused myself rewriting what was a double-negative	22:28
clayg	timburke: can you think of any prior art on new features annoucing themselves in /info	22:28
clayg	I think it's a great idea - i'm just not sure to call it "allowed" - would love to look a diff that exposed /info on a non-configurable feature before?	22:29
timburke	and yeah, my thought was that we could go check -- but that it'd have to be after we sent the POST and had an indication that we'd just POSTed to a hardlink	22:29
timburke	on the /info thing, i don't think we've really got precedent. but it kinda sucks that clients have to know that data-segments were added to SLO in 2.17.0	22:31
clayg	yes, totally agree! it'd be a great habit to get into.	22:31
*** diablo_rojo has quit IRC		22:32
clayg	but allowed/enabled sounds too much like it invites being turned off to me 🤔	22:32
clayg	available?	22:33
timburke	maybe "supports_static_links"? available's OK by me, too, though	22:34
clayg	maybe w/o prior art I'll ask if we can defer it to a follow-up change and maybe also do an audit of other features that deserve similar treatment?	22:34
clayg	would it be ok to defer it? I could put up a placeholder patch and we could talk about it at the meeting?	22:35
timburke	👍	22:36
clayg	Maybe we can say static link 307 on POST makes sense because that verb doesn't supported x-if-match symantic and probably couldn't 🤔	22:37
timburke	mainly just a thought -- i feel like there's some window of diminishing returns -- in fact, at a year and a half, the data segments thing is maybe approaching the end of that window	22:37
clayg	that's probably true, the fresher the feature the more clients need to assume their favorite cluster's don't have it...	22:39
timburke	i think the 307's pretty fair -- i was just noticing that we tell the client to go try again elsewhere without providing any of the context about it being a hardlink	22:39
*** hoonetorg has quit IRC		22:48
*** hoonetorg has joined #openstack-swift		22:50
clayg	so it looks like I could throw an `x-symlink-target-etag` in the 307 response? That might be a little useful?	22:50
*** hoonetorg has quit IRC		22:57
*** hoonetorg has joined #openstack-swift		23:01
openstackgerrit	Clay Gerrard proposed openstack/swift master: Allow "static symlinks" https://review.opendev.org/633094	23:03
*** rcernin has joined #openstack-swift		23:11

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!