Wednesday, 2020-09-30

timburke	stay safe mattoliverau! GL!	00:18
*** gyee has quit IRC		01:08
openstackgerrit	Xuan Yandong proposed openstack/swift-bench master: Remove six Replace the following items with Python 3 style code. https://review.opendev.org/755151	01:21
*** clayg has quit IRC		02:14
*** tkajinam has quit IRC		02:14
*** StevenK has quit IRC		02:14
*** mattoliverau has quit IRC		02:14
*** tkajinam_ has joined #openstack-swift		02:15
*** StevenK has joined #openstack-swift		02:15
*** clayg has joined #openstack-swift		02:15
*** ChanServ sets mode: +v clayg		02:15
*** mattoliverau has joined #openstack-swift		02:20
*** tepper.freenode.net sets mode: +v mattoliverau		02:20
*** viks____ has joined #openstack-swift		02:28
*** rcernin has quit IRC		02:55
*** rcernin_ has joined #openstack-swift		02:56
*** psachin has joined #openstack-swift		03:31
*** psachin has quit IRC		03:32
*** psachin has joined #openstack-swift		03:33
*** m75abrams has joined #openstack-swift		04:22
*** evrardjp has quit IRC		04:33
*** evrardjp has joined #openstack-swift		04:33
*** mikecmpbll has joined #openstack-swift		04:37
openstackgerrit	Xuan Yandong proposed openstack/swift-bench master: Remove six and py27 tox Replace the following items with Python 3 style code. https://review.opendev.org/755151	06:06
openstackgerrit	Xuan Yandong proposed openstack/swift-bench master: Remove six and py27 tox https://review.opendev.org/755151	06:43
*** rcernin_ has quit IRC		07:06
*** rcernin_ has joined #openstack-swift		07:17
*** rcernin_ has quit IRC		07:20
*** rcernin has joined #openstack-swift		07:20
*** mikecmpbll has joined #openstack-swift		08:06
*** rcernin has quit IRC		08:48
openstackgerrit	wu.shiming proposed openstack/swift master: requirements: Drop os-testr https://review.opendev.org/755232	08:53
*** ab-a has quit IRC		09:35
*** ab-a has joined #openstack-swift		09:36
openstackgerrit	Merged openstack/swift stable/train: py3: Fix swift-dispersion-populate https://review.opendev.org/754853	10:02
*** StevenK has quit IRC		10:58
*** StevenK has joined #openstack-swift		10:58
*** rcernin has joined #openstack-swift		11:50
*** rcernin has quit IRC		12:16
*** tkajinam_ has quit IRC		13:12
*** m75abrams has quit IRC		13:57
*** gyee has joined #openstack-swift		14:59
*** ozzzo has joined #openstack-swift		15:19
*** Hamidreza has joined #openstack-swift		15:24
Hamidreza	Hi	15:25
Hamidreza	I've a question about openstack swift storage	15:25
Hamidreza	I add 20 disks to my cluster and nodes then update the ring, Now it should balance the data but it didn't do that!!!	15:25
Hamidreza	what should i do?	15:25
timburke	Hamidreza, have you checked that the object-replicator is running on all nodes?	15:27
Hamidreza	I checked object-replicator and even rsync proccess	15:28
Hamidreza	and they were working	15:28
ormandj	we see a lot of intermittent ConnectionTimeouts to backend servers (using servers per port of 2) - i would have expected slowness, but not connection timeouts if disks are saturated. is this expected with ussuri?	15:28
Hamidreza	and even i increased the number of proccess	15:28
timburke	i just saw http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017675.html -- note that you won't want to keep the fs read-only; the replicator needs to be able to delete data that no longer belongs on that disk	15:30
Hamidreza	ok, what can I do ?	15:32
timburke	Hamidreza, you may want to look at the handoffs_first and handoff_delete options: https://github.com/openstack/swift/blob/2.26.0/etc/object-server.conf-sample#L287-L304	15:32
timburke	if things have been fairly healthy, you should be fine to set handoffs_first=True and handoff_delete=1, restart the replicators, and wait for a replication cycle -- the full drives should start draining fairly quickly at that point	15:34
timburke	odds are, you'll be limited by the iops of the new drives	15:34
Hamidreza	I don't want to start object replicator	15:36
Hamidreza	I've stoped it before	15:36
Hamidreza	because, oneday suddenly I saw that all of my disks get broken one by one	15:37
Hamidreza	I think they were under high pressure	15:37
Hamidreza	so I disable the object replicator	15:38
Hamidreza	after that day none of my disks get broken!!!	15:38
timburke	"broken" how? the replicators (and, if using erasure coding, reconstructors) are how swift (1) ensures that data remains durable even in the face of failing drives and (2) moves data as part of expansions so that drives don't fill up. it's a vital part of your swift deployment	15:42
*** openstackgerrit has quit IRC		15:46
Hamidreza	(2) yeah, this is vital part of swift deployment but it didn't work for me. it must rebalance the disks and move from full disks to empty	15:47
*** mikecmpbll has quit IRC		15:53
*** mikecmpbll has joined #openstack-swift		15:54
*** Hamidreza has quit IRC		16:02
*** psachin has quit IRC		16:34
clayg	Hamidreza: maybe try changing the ionice_priority setting for the object-replicator with rather low concurrency and handoffs_first=True while monitoring your devices with iostat	16:50
clayg	hopefully you can find a balance that allows your disks to service the io needs of your client facing traffic as well as the consistency engines io needs for background work	16:50
clayg	in an emergency you can also turn off other processes like the object-auditor	16:50
clayg	if you run container resources on the same disks as your object devices that can put a lot of pressure on those disks as well	16:51
clayg	dedicated ssd's are best	16:51
clayg	timburke: so my config tests are having problems with the dlo middleware trying to reparse the config for legacy options...	16:52
clayg	I'm sure I can get the tests passing - but i'm not looking forward to an overhaul of staticweb	16:52
clayg	is there maybe a better idea than message passing via the request environ for how SLO can signal to proxy-logging that an error occurred during the iterator?	16:54
clayg	i feel like catch errors and proxy logging are starting to kinda team up or converge when it comes to watch dogging the iterators on content length 🤔	16:55
timburke	ormandj, connection timeouts aren't so surprising, especially if it's a busy cluster. one of the things that can happen is the object-server gets stuck waiting on disk IO, so incoming connections can't be accepted. kernel will queue some of them, but eventually, the connect will block. you can use something like `lsof -a -u swift -i -s TCP:LISTEN -T q` to check in on your listen queue depths	16:55
ormandj	timburke: yeah, is there any way around it? tl;dr, we've had to lower replication workers to almost nothing and are only doing about 40MB/s to the 'new' server, and it's still causing every client massive problems with ~8 out of 56 drives per server relatively iops saturated	16:56
ormandj	customers would be fine with slow, but broken is not so much. we could increase conn_timeout, but historically, that really hasn't helped	16:57
ormandj	we're about to try servers_per_port =4 instead of 2 in hopes it will help	16:57
ormandj	lesson learned, don't deploy with less than 20+ nodes in the future, but trying to figure out a way out of this pickle in the meantime hah	16:58
ormandj	i think we're down at 12 replication workers atm	16:59
ormandj	see some read queues of 20-30ish	17:07
ormandj	using ss, read/send queues at 127/128	17:08
*** openstackgerrit has joined #openstack-swift		17:41
openstackgerrit	Clay Gerrard proposed openstack/swift master: Test proxy-server.conf-sample https://review.opendev.org/755087	17:41
openstackgerrit	Clay Gerrard proposed openstack/swift master: Add staticweb to default pipeline https://review.opendev.org/755132	17:47
openstackgerrit	Clay Gerrard proposed openstack/swift master: Log error processing manifest as ServerError https://review.opendev.org/752770	17:47
*** recyclehero has joined #openstack-swift		17:59
recyclehero	hi	17:59
recyclehero	consider horozin, keystone and databases gone. swift-proxy and container-account-object are avaialbe.	18:01
recyclehero	wondering is there any hope for recovery of data or just burn the thing	18:01
DHE	recovery how?	18:03
recyclehero	Bmy queens infra was very unstable mostly hardware problems. going to deploy new usurri with kolla	18:03
recyclehero	what should I do with my swift? its something good to know for planning DR later	18:04
recyclehero	DHE: getting files	18:04
recyclehero	objects sorry	18:04
DHE	the official thing I could suggest is making a project with the same project ID as the old one. however looking at my (admittedly old) version of the openstack cli tool there isn't a means to select your own uuid so you might have to go into the keystone DB post creation and change it	18:05
DHE	authentication is really by your project membership. other than using read/write ACLs swift doesn't care much about individual users	18:05
DHE	I built my swift cluster to have minimal keystone dependencies, so there are tempurl secrets everywhere and most programs that would authenticate use that instead. really keystone is only needed for deletion (that might be beatable, but I'm using bulk delete so not bothering) and container/object listings	18:06
recyclehero	DHE: aha, but I have to give it to use swift to get them back. is there a way I can assemble files from swift dbs?	18:07
DHE	well all the swift dbs under the account and container directories on your disks are sqlite so you can absolutely stick your nose in there with the sqlite tool	18:07
recyclehero	underlying fs + swift dbs	18:08
recyclehero	?	18:08
DHE	if you know the object URLs you want there's swift-get-nodes which will take the ring file and path name and provide both full server+paths, and CURL commands to get some data	18:09
recyclehero	files are made into chunks if I am correct. somehow one should assemble them back right?	18:09
timburke	your swift data is still safe and sound; the difficult part will be finding it. as DHE suggests, if you can get the project IDs to match between old and new, it'll all be there and available. if matching project IDs isn't feasible, you should still be able to create a "reseller admin" user that will have full access to any account; that user could then do server side copies of data to a new location, then clean up the old data	18:09
DHE	if you're using EC, yes they are. if you're using multi-way replication each file is fully intact	18:09
recyclehero	timburke: nice. I read the OS was made from nova and swift respectivly by nasa and rackspace. I like swift	18:11
timburke	us too :-)	18:12
recyclehero	and all I should do witn cmd now?	18:12
recyclehero	sure. thanks :-)	18:13
timburke	the cardinality of accounts is usually fairly small -- i'd probably write a little script to just walk the account disks on each node and build a list of all accounts in the cluster. then you'd need figure out which account was who's and make a mapping from old account to new account, which may be non-trivial. then script a data-mover and wait a while	18:16
timburke	might be worth doing periodic (encrypted) backups of the keystone db into swift ;-) then i think you could just restore from the backup and be up and running again	18:17
recyclehero	thanks again I will start and came back when I have some progress	18:23
openstackgerrit	Romain LE DISEZ proposed openstack/swift master: Fix a race condition in case of cross-replication https://review.opendev.org/754242	18:32
ormandj	if you have logs, you can probably clean a lot of account info out of them	18:32
ormandj	and just mangle the db	18:32
openstackgerrit	Romain LE DISEZ proposed openstack/swift master: Fix a race condition in case of cross-replication https://review.opendev.org/754242	18:38
timburke	almost meeting time!	20:51
seongsoocho	\o/	20:54
kota_	morning	20:57
mattoliverau	Morning	20:59
*** nicolasbock has joined #openstack-swift		21:11
nicolasbock	Hi! Does swift support rewriting requests to public (staticweb, tempurl) containers? I am looking to be able to point my browser to `www.example.com` and be redirected to `https://swift-cluster.example.com/v1/AUTH_account/container/object?....`	21:37
timburke	nicolasbock, you'll want to look at the cname_lookup and domain_remap middlewares, but the long and short of it is yeah, that's doable	21:39
nicolasbock	Oh cool	21:39
nicolasbock	Thanks for the pointer timburke !	21:39
timburke	np! idea is to have www.example.com have a cname record pointing to something like container.auth-account.swift-cluster.example.com, then cname_lookup does the translation in the received host header and domain_remap unpacks the account/container pieces	21:44
nicolasbock	Nice, that doesn't sound too bad	21:46
DHE	I'm just using nginx as a proxy to the proxy (HA!) no tempurl though (but it can be done if need be)	21:48
clayg	wsgi is the worst abstraction for web request processing; except for all the others	22:04
clayg	My 12 year old has his very first football scrimmage tonight! Go Rice Ravens!	22:05
timburke	haha nice! have fun!	22:05
timburke	oh! i also kinda wanted to point out https://review.opendev.org/#/c/751966/ to people -- i still haven't gotten a fips-enabled vm to come back up in a usable state, but it looks like the patch might be about ready	22:06
patchbot	patch 751966 - swift - replace md5 with swift utils version - 11 patch sets	22:06
timburke	merging it will result in a bunch of merge conflicts, so it seems worth having on people's radars	22:06
timburke	and if it means i'll be able to get even just one review from cschwede i'm calling it worth it ;-)	22:07
*** rcernin has joined #openstack-swift		22:12
*** mikecmpbll has quit IRC		22:23
*** mikecmpbll has joined #openstack-swift		22:29
*** tkajinam has joined #openstack-swift		23:00
openstackgerrit	Tim Burke proposed openstack/liberasurecode master: Be willing to write fragments with legacy crc https://review.opendev.org/738959	23:45
openstackgerrit	Tim Burke proposed openstack/swift master: ec: Add an option to write fragments with legacy crc https://review.opendev.org/739164	23:51

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!