Monday, 2019-03-11

*** mikecmpbll has quit IRC		01:57
*** gkadam has joined #openstack-swift		02:45
kota_	alecuyer: when I noticed the patch to remove gRPC, I was feeling same with clayg. To move the grpc because of a technical issue, it is smoothly better than persistent older version, just IMO.	03:47
*** gkadam has quit IRC		03:47
* kota_ is going to go out of his office. will be back to there tomorrow morning.		05:23
*** axblueblader has joined #openstack-swift		05:30
openstackgerrit	zhufl proposed openstack/swift master: Do not use self in classmethod https://review.openstack.org/642359	06:28
*** ianychoi has quit IRC		06:32
*** ianychoi has joined #openstack-swift		06:32
*** ianychoi has quit IRC		06:35
*** ianychoi has joined #openstack-swift		06:36
axblueblader	hello guys, I'm new to swift and currently researching about it's object versioning functionality for large object, is it ok to ask questions on this channel?	06:40
*** e0ne has joined #openstack-swift		06:51
*** pcaruana has joined #openstack-swift		07:00
*** e0ne has quit IRC		07:02
*** rcernin has quit IRC		07:03
*** axblueblader has quit IRC		07:53
*** axblueblader has joined #openstack-swift		08:09
*** tkajinam has quit IRC		08:17
alecuyer	kota_: OK, then I will go ahead and work from that first patch	08:22
*** axblueblader has quit IRC		08:25
*** e0ne has joined #openstack-swift		08:27
*** mikecmpbll has joined #openstack-swift		08:56
*** hseipp has joined #openstack-swift		08:59
*** e0ne has quit IRC		09:22
*** mikecmpbll has quit IRC		09:33
*** mikecmpbll has joined #openstack-swift		09:34
*** axblueblader has joined #openstack-swift		09:36
*** e0ne has joined #openstack-swift		09:42
*** axblueblader has quit IRC		09:50
*** e0ne has quit IRC		10:43
*** e0ne has joined #openstack-swift		10:47
*** e0ne has quit IRC		11:37
*** ybunker has joined #openstack-swift		11:47
ybunker	hi all, unfortunately im still facing problems with the account & container replication on the new nodes.. i really don't know what else to look for, so if someone can give me a hand or hint on this i will really appreciated :), here is the configuration files and some output of the logs: http://pasted.co/f931a29a	12:04
ybunker	if more info is needed to do some troubleshooting on this please don't hesitate to ask for	12:04
ybunker	objects are replicating fine	12:07
*** [diablo] has quit IRC		12:26
*** [diablo] has joined #openstack-swift		12:29
*** e0ne has joined #openstack-swift		12:30
ybunker	anyone?	12:39
*** [diablo] has quit IRC		12:43
*** hseipp has quit IRC		13:03
*** ianychoi has quit IRC		13:28
*** ianychoi has joined #openstack-swift		13:29
*** e0ne has quit IRC		14:14
zaitcev	ybunker: sorry, they are mostly in California	14:16
*** e0ne has joined #openstack-swift		14:23
ybunker	oh i see :(	14:24
*** e0ne has quit IRC		14:36
*** e0ne has joined #openstack-swift		14:38
*** mrjk has quit IRC		14:43
zaitcev	In theory I should be able to help, but in practice... BTW, those configs look very strange for a multinode. The rsync.conf looks like someone adapted a SAIO for multinode.	14:43
zaitcev	here, take a look - http://www.zaitcev.us/things/swift/rhev-24c-01.etc.rsyncd.conf	14:46
zaitcev	Oh, wait, n/m	14:47
zaitcev	They are SAIOs with 1.conf, 2.conf etc	14:48
*** mrjk has joined #openstack-swift		14:48
zaitcev	ybunker: you must make sure that account ring includes devices with port 4103	14:50
*** FlorianFa has joined #openstack-swift		14:50
ybunker	zaitcev: let me send you the account ring just to double check	14:50
zaitcev	ybunker: please don't :-) Run swift-ring-builder account.builder without any more arguments and double-check that devices in the ring correspond to nodes and ports where account servers listen.	14:51
*** FlorianFa has quit IRC		14:52
*** FlorianFa has joined #openstack-swift		14:52
ybunker	zaitcev: yes, there are ok (if you wanna take a look :) -> http://pasted.co/70b963c5)	14:54
zaitcev	ybunker: make sure the replicator uses the correct ring, then. Maybe you have some docker thrown in or whatever. I cannot really tell.	15:05
zaitcev	I see that 10.2.1.19:4103 is in the builder file at least.	15:06
zaitcev	The "rebalance" stage writes out account.ring.gz	15:06
zaitcev	Then you scp it from admin workstation to swift-node09:	15:07
zaitcev	Well, I'm sure you know all that.	15:07
*** mrjk has quit IRC		15:31
*** mrjk has joined #openstack-swift		15:32
*** mrjk has quit IRC		15:32
*** mrjk has joined #openstack-swift		15:32
ybunker	yes, and then i restart the services on the nodes	15:48
ybunker	zaitcev: i have the following permissions on the rings: -rw-r--r-- 1 swift swift	15:49
zaitcev	ybunker: Permissions are not a factor, because the replicator would've bailed if it could not read the ring. Look, it's a very simple process. When replicator starts, it gets a list of IPs available for it to listen (unless bind_ip is set). Then it reads the ring and searches for itself in it. It's simple as pie!	16:12
notmyname	good morning	16:13
zaitcev	ybunker: You have a very convoluted, strange configuration. I cannot diagnose it for you using just hearsay, sorry.	16:13
ybunker	zaitcev: i know.. but the problem is that every config file looks good, so don't know where to look for	16:14
ybunker	zaitcev: no problem, thanks for the tips :)	16:14
zaitcev	ybunker: oh just strace thet replicator then and look what it opens and reads. Then md5sum on the file. It's a heavy-weight menthod, but then you'll see that it reads from a place where you didn't copy that ring.gz and it's reading something obsolete. Or you forgot the rebalance. Or heck, I saw people run rebalance, it errors out because min_hours is not reached, then they blindly copy the old ring.gz to node...	16:15
tdasilva	ybunker: are you able to get recon info from the account servers on those ports?	16:16
ybunker	zaitcev: will check on that and get back	16:16
*** mrjk has quit IRC		16:17
zaitcev	Oh, actually, what tdasilva says. That gives the md5sum of the ring the server sees (well, hopefully it's the same as the one the replicator sees, because in the docker world you can't even be sure in that much, le sigh).	16:17
tdasilva	something like: curl `http://10.1.1.11:4101/recon/devices` or something like that...	16:19
tdasilva	and then try for the replication ip:port also	16:19
zaitcev	10.2.1.19:4103 in his case	16:19
tdasilva	I always like to add the healthcheck middleware on the pipeline too, cause it'a quick way to just get a heartbeat on the service....	16:20
openstackgerrit	Tim Burke proposed openstack/swift master: Stop monkey-patching mimetools https://review.openstack.org/640552	16:20
*** pcaruana has quit IRC		16:23
*** pcaruana has joined #openstack-swift		16:23
ybunker	also i notice that some object disks are more usable in terms of % than others, so maybe something with the replication ring is going on	16:30
*** e0ne has quit IRC		16:41
*** gyee has joined #openstack-swift		16:46
zaitcev	ybunker: maybe, but let's focus on the problem at hand, which is replicator not finding itself in the ring, according to your pastebin. Once you got them running solid on all nodes, then you can look at utilization.	16:57
ybunker	zaitcev: yep	16:57
*** e0ne has joined #openstack-swift		16:57
*** mrjk has joined #openstack-swift		17:04
openstackgerrit	Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520	17:05
*** e0ne has quit IRC		17:13
*** patchbot has quit IRC		17:14
*** patchbot has joined #openstack-swift		17:14
*** e0ne has joined #openstack-swift		17:14
zaitcev	timburke: I'm very sorry but I'm very confused! I went here... then grepped BaseMessage https://git.openstack.org/cgit/openstack/swift/tree/swift/common/wsgi.py?id=fac7d743db49858c17228f3ebb470948dae7cc23#n429	17:17
timburke	bah! i meant to switch all the BaseMessage stuff to wsgi.HttpProtocol.MessageClass...	17:18
zaitcev	oh	17:18
zaitcev	I thought it was some magic way to see some class methods or whatever	17:19
openstackgerrit	Tim Burke proposed openstack/swift master: Stop monkey-patching mimetools https://review.openstack.org/640552	17:22
openstackgerrit	Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520	17:22
zaitcev	timburke: Why not use __super__? Just asking.	17:29
timburke	zaitcev, py2	17:29
timburke	or, were you thinking super(..., self).blahblahblah? mimetools.Message doesn't inherit from object iirc	17:30
zaitcev	Oh god	17:31
*** e0ne has quit IRC		17:31
*** mikecmpbll has quit IRC		17:32
timburke	yeah, basically. :-(	17:34
timburke	https://github.com/python/cpython/blob/v2.7.15/Lib/rfc822.py#L85	17:34
ybunker	mmm when i query the account.builder ring: account.builder, build version 260, id (not assigned)	17:41
ybunker	that "id (not assigned)" is ok?	17:41
ybunker	zaitcev: also i nithce that swift_container_server.log file is getting all 404 on PUTs on all the nodes	18:01
ybunker	tdasilva: i have run the curl to recon devices and im getting the full list of disks: {"/srv/node": ["3", "1", "7", "6", "9", "4", "10", "11", "5", "2", "12", "8"]}	18:08
tdasilva	ybunker: and you did that for all account servers ip:port combos? and replication combos?	18:14
zaitcev	I'd start with checking 10.2.1.19:4301/recon/ringmd5	18:20
*** e0ne has joined #openstack-swift		18:23
*** e0ne has quit IRC		18:25
openstackgerrit	Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520	18:25
ybunker	zaitcev: would it be 10.2.1.19:4103 instead of 4301?	18:28
zaitcev	ybunker: possibly.	18:28
ybunker	zaitcev: ok, its the same for all the nodes	18:30
ybunker	zaitcev: the md5 in the master node is the same on all the nodes	18:36
ybunker	http://pasted.co/545a2c7a	18:39
zaitcev	ybunker: do you still get the "swift-node09 account-replicator: Can't find itself" or has it stopped?	18:56
ybunker	zaitcev: still on	18:57
*** e0ne has joined #openstack-swift		18:57
ybunker	zaitcev: what if i configure the account-replicator for each of the disks inside /etc/swift/account-server/ and add rsync_module = 10.2.1.19::account4201 ... then account4202 and so on for the rest ?	18:58
zaitcev	ybunker: okay, does that md5 match the one you get with md5sum /etc/swift/account.ring.gz (or what is the right path on the master or admin system)?	18:58
ybunker	zaitcev: yes, its the same	18:59
zaitcev	ybunker: and the timestamp of it is newer than the builder? Just with ls -lt /etc/swift/account*	19:00
zaitcev	.ring.gz has to be on top of .builder	19:00
ybunker	yes	19:04
ybunker	zaitcev: if i look at the rsync logs i found : unknown module 'container' tried from swift-node01 (10.2.1.11)	19:04
zaitcev	Yes, you have a ton of problems there. But they do not matter unless you get replicators actually working.	19:05
ybunker	yep	19:05
zaitcev	Using 1.conf 2.conf etc. on a multi-node setup is absurd. When such SAIO-like setup is in place, rsync must add port number to that thing, forgot what it calls it, maybe "module". There's an obscude setting for it that SAIO sets. It's called something like "vm_mode". BUT	19:07
zaitcev	BUT it's much better junk those crazy 1.conf and 2.conf and just use a normal setup in production. Then you never have that "unknown module" things happening in rsync.	19:07
zaitcev	Anyway	19:08
ybunker	yeah, once i get this thing working i will then move and take some corrective actions on the configurations, but first i need to get this think to work :(	19:09
zaitcev	Right	19:09
zaitcev	oh	19:14
zaitcev	I think I see it now	19:14
zaitcev	In http://pasted.co/f931a29a, replicator says that it wants port 4103, right? But that port is only used for the access network, not for replication network. Look into http://pasted.co/70b963c5. All instances of 4103 are in the left column.	19:16
zaitcev	And you have two sets of listeners, 3.conf listens on 4103 and r_3.conf listens on 4203.	19:17
zaitcev	I can only conclude that your replicator uses 3.conf. I don't know how you start it, but it just does. Then, it sees it binding to 4103, then tries to find that among replication ports in the ring, and fails.	19:18
zaitcev	You need to make sure that your replicator uses r_3.conf or whatever is the conf that its listener does. Edit some Systemd unit files or whatever.	19:19
zaitcev	Or here's an even better suggestion to start	19:20
zaitcev	Just don't use the replication network at first	19:20
ybunker	mmm and how can i change that in order to work? can i try put it all on one config file instead of having 3 ? (is an inherited cluster :-( )	19:20
zaitcev	Get it all running with just one network, and have rings with port and replication_port set to same value at first. THen, as you got that debugged, add a distinct replication network.	19:21
zaitcev	Oooh...	19:21
zaitcev	So someone invented this mad config and dumped it on you.	19:21
ybunker	yes!!! exactly :-( and its a pain in the... hahah	19:22
zaitcev	So, how is the account replicator started? Is it swift-init, SystemD, or something else?	19:23
ybunker	swift-init	19:24
zaitcev	You have to find that out. There, it gets its argument... it's a path. That path must be such that it ends reading r_3.conf instead 3.conf.	19:24
zaitcev	Hmm, wait. That one may be computing the paths.	19:24
ybunker	and i see processes for 1.conf 2.conf 3.conf and r_1.conf r_2.conf and r_3.conf	19:25
zaitcev	But it's mistaken when you have both maybe. Because it never expected that to happen.	19:25
ybunker	is there a way to get rid of those r_x.conf files and just we one config file for acct, another for cont and one for obj ?	19:26
ybunker	at least on the new server, and then i will try to change the rest of the cluster	19:27
zaitcev	Yes, there is, but that effectively switches off the replication network and uses the front-to-back network for replication traffic.	19:27
zaitcev	I'm afraid to suggest that outright, because who knows how much network capacity you're using right now. Merging them may bring the whole thing to its knees.	19:28
zaitcev	Hold on, let me check if you can force replicator's port	19:29
ybunker	thanks a lot zaitcev	19:29
zaitcev	running vi etc/account-server.conf-sample, let's see..	19:29
zaitcev	nope, it's not possible	19:31
ybunker	:-(	19:31
zaitcev	ybunker: when you said "i see processes for ... r_3.conf", what does it actually include? Could it be that you have _two_ account replicators running: one for 3.conf, which does nothing, and one for r_3.conf?	19:32
zaitcev	If so, then all you need is prevent the unnecessary and confusing extra replicator from starting.	19:34
ybunker	zaitcev: http://pasted.co/d36d3110	19:34
zaitcev	Ah, yes, a full set :-)	19:35
openstackgerrit	Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520	19:36
zaitcev	So, try to comment out [account-replicator] from 3.conf and do swift-init account-replicator stop, then start. Check if there aren't any bad messages.	19:36
ybunker	ok, well errors seems to gone away :-)	19:39
ybunker	but... still not replicating :(	19:39
zaitcev	How do you know?	19:40
zaitcev	number of passes remains at zero in recon?	19:40
*** e0ne has quit IRC		19:40
zaitcev	Or some other method?	19:40
*** e0ne has joined #openstack-swift		19:40
ybunker	looking at the log files for swift_account_server.log and listing the content of /srv/node/{1\|2\|3} does not appear any account directory	19:40
ybunker	in the [account-replicator] i only have vm_test_mode = yes	19:43
zaitcev	Right, that's how r_3.conf is set up. That setting appends "4203" to "account".	19:43
zaitcev	So, actually, if you walk through all of ?.conf and comment out replicators, you should get rid of rsync complaining about the naked "container" module.	19:44
zaitcev	Although	19:44
zaitcev	That actually means that one of those extra container replicators managed to find itself in the ring and tried to replicate.	19:45
ybunker	oh ok i see	19:47
ybunker	and on the main account-replicator.conf	19:48
zaitcev	Yeah, it's a little surprising that you have both that and account/1.conf	19:48
ybunker	do i need to comment out also the [account-replicator] ? or leave it with log_facility, concurrency = 1 and vm_test_mode = yes ?	19:48
zaitcev	I don't know for sure. You have to check if it owns any devices in the ring...	19:50
zaitcev	If you do need it, then it must keep vm_test_mode, because that's how all those rsync listeners are set up.	19:51
ybunker	the "unknown module 'container' tried from" errors continue inside the rsyncd.log	19:52
zaitcev	You still have a container replicator without vm_test_mode somewhere in the cluster. Rsync should also tell you the IP of the offending node.	19:53
ybunker	zaitcev: will review all the cluster and get back tomorrow hopefully with some news :), thanks a lot zaitcev for all the help today, really appreciated!	19:54
*** ybunker has quit IRC		19:56
*** e0ne has quit IRC		20:11
*** e0ne has joined #openstack-swift		20:12
*** e0ne has quit IRC		20:15
*** itlinux has joined #openstack-swift		20:30
*** itlinux has quit IRC		20:34
openstackgerrit	Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520	20:38
*** e0ne has joined #openstack-swift		20:56
*** itlinux has joined #openstack-swift		20:56
*** e0ne has quit IRC		21:04
*** e0ne has joined #openstack-swift		21:12
*** e0ne has quit IRC		21:38
*** pcaruana has quit IRC		21:45
timburke	clayg, notmyname: you guys expressed some opinions on https://review.openstack.org/#/c/640552/ -- got some review bandwidth to second zaitcev's +2?	21:49
patchbot	patch 640552 - swift - Stop monkey-patching mimetools - 5 patch sets	21:49
*** itlinux has quit IRC		21:55
notmyname	timburke: so if I understand p 640552 correctly, we used to patch mimetools to set None so we can detect when the client doesn't set it. and now we do it with the `protocol_class` that's passed in to the wsgi server	22:42
patchbot	https://review.openstack.org/#/c/640552/ - swift - Stop monkey-patching mimetools - 5 patch sets	22:42
notmyname	although I'm not sure where the `protocol_class` thing gets used. eventlet maybe?	22:43
notmyname	also a google search for "eventlet wsgi 'protocol_class'" only shows 6 results, 5 from swift's codebase	22:46
*** threestrands has joined #openstack-swift		22:50
*** tkajinam has joined #openstack-swift		22:56
mattoliverau	morning	22:58
zaitcev	I went directly to "git clone eventlet" to see how Tim chose to wedge this stuff in, seemed agreeable.	23:07
notmyname	ya, he just explained it to me too. looks good when you finally see where it's all being called from	23:10
timburke	notmyname, +1?? are you a core or not!? :P	23:17
*** threestrands has quit IRC		23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!