*** mikecmpbll has quit IRC | 01:57 | |
*** gkadam has joined #openstack-swift | 02:45 | |
kota_ | alecuyer: when I noticed the patch to remove gRPC, I was feeling same with clayg. To move the grpc because of a technical issue, it is smoothly better than persistent older version, just IMO. | 03:47 |
---|---|---|
*** gkadam has quit IRC | 03:47 | |
* kota_ is going to go out of his office. will be back to there tomorrow morning. | 05:23 | |
*** axblueblader has joined #openstack-swift | 05:30 | |
openstackgerrit | zhufl proposed openstack/swift master: Do not use self in classmethod https://review.openstack.org/642359 | 06:28 |
*** ianychoi has quit IRC | 06:32 | |
*** ianychoi has joined #openstack-swift | 06:32 | |
*** ianychoi has quit IRC | 06:35 | |
*** ianychoi has joined #openstack-swift | 06:36 | |
axblueblader | hello guys, I'm new to swift and currently researching about it's object versioning functionality for large object, is it ok to ask questions on this channel? | 06:40 |
*** e0ne has joined #openstack-swift | 06:51 | |
*** pcaruana has joined #openstack-swift | 07:00 | |
*** e0ne has quit IRC | 07:02 | |
*** rcernin has quit IRC | 07:03 | |
*** axblueblader has quit IRC | 07:53 | |
*** axblueblader has joined #openstack-swift | 08:09 | |
*** tkajinam has quit IRC | 08:17 | |
alecuyer | kota_: OK, then I will go ahead and work from that first patch | 08:22 |
*** axblueblader has quit IRC | 08:25 | |
*** e0ne has joined #openstack-swift | 08:27 | |
*** mikecmpbll has joined #openstack-swift | 08:56 | |
*** hseipp has joined #openstack-swift | 08:59 | |
*** e0ne has quit IRC | 09:22 | |
*** mikecmpbll has quit IRC | 09:33 | |
*** mikecmpbll has joined #openstack-swift | 09:34 | |
*** axblueblader has joined #openstack-swift | 09:36 | |
*** e0ne has joined #openstack-swift | 09:42 | |
*** axblueblader has quit IRC | 09:50 | |
*** e0ne has quit IRC | 10:43 | |
*** e0ne has joined #openstack-swift | 10:47 | |
*** e0ne has quit IRC | 11:37 | |
*** ybunker has joined #openstack-swift | 11:47 | |
ybunker | hi all, unfortunately im still facing problems with the account & container replication on the new nodes.. i really don't know what else to look for, so if someone can give me a hand or hint on this i will really appreciated :), here is the configuration files and some output of the logs: http://pasted.co/f931a29a | 12:04 |
ybunker | if more info is needed to do some troubleshooting on this please don't hesitate to ask for | 12:04 |
ybunker | objects are replicating fine | 12:07 |
*** [diablo] has quit IRC | 12:26 | |
*** [diablo] has joined #openstack-swift | 12:29 | |
*** e0ne has joined #openstack-swift | 12:30 | |
ybunker | anyone? | 12:39 |
*** [diablo] has quit IRC | 12:43 | |
*** hseipp has quit IRC | 13:03 | |
*** ianychoi has quit IRC | 13:28 | |
*** ianychoi has joined #openstack-swift | 13:29 | |
*** e0ne has quit IRC | 14:14 | |
zaitcev | ybunker: sorry, they are mostly in California | 14:16 |
*** e0ne has joined #openstack-swift | 14:23 | |
ybunker | oh i see :( | 14:24 |
*** e0ne has quit IRC | 14:36 | |
*** e0ne has joined #openstack-swift | 14:38 | |
*** mrjk has quit IRC | 14:43 | |
zaitcev | In theory I should be able to help, but in practice... BTW, those configs look very strange for a multinode. The rsync.conf looks like someone adapted a SAIO for multinode. | 14:43 |
zaitcev | here, take a look - http://www.zaitcev.us/things/swift/rhev-24c-01.etc.rsyncd.conf | 14:46 |
zaitcev | Oh, wait, n/m | 14:47 |
zaitcev | They are SAIOs with 1.conf, 2.conf etc | 14:48 |
*** mrjk has joined #openstack-swift | 14:48 | |
zaitcev | ybunker: you must make sure that account ring includes devices with port 4103 | 14:50 |
*** FlorianFa has joined #openstack-swift | 14:50 | |
ybunker | zaitcev: let me send you the account ring just to double check | 14:50 |
zaitcev | ybunker: please don't :-) Run swift-ring-builder account.builder without any more arguments and double-check that devices in the ring correspond to nodes and ports where account servers listen. | 14:51 |
*** FlorianFa has quit IRC | 14:52 | |
*** FlorianFa has joined #openstack-swift | 14:52 | |
ybunker | zaitcev: yes, there are ok (if you wanna take a look :) -> http://pasted.co/70b963c5) | 14:54 |
zaitcev | ybunker: make sure the replicator uses the correct ring, then. Maybe you have some docker thrown in or whatever. I cannot really tell. | 15:05 |
zaitcev | I see that 10.2.1.19:4103 is in the builder file at least. | 15:06 |
zaitcev | The "rebalance" stage writes out account.ring.gz | 15:06 |
zaitcev | Then you scp it from admin workstation to swift-node09: | 15:07 |
zaitcev | Well, I'm sure you know all that. | 15:07 |
*** mrjk has quit IRC | 15:31 | |
*** mrjk has joined #openstack-swift | 15:32 | |
*** mrjk has quit IRC | 15:32 | |
*** mrjk has joined #openstack-swift | 15:32 | |
ybunker | yes, and then i restart the services on the nodes | 15:48 |
ybunker | zaitcev: i have the following permissions on the rings: -rw-r--r-- 1 swift swift | 15:49 |
zaitcev | ybunker: Permissions are not a factor, because the replicator would've bailed if it could not read the ring. Look, it's a very simple process. When replicator starts, it gets a list of IPs available for it to listen (unless bind_ip is set). Then it reads the ring and searches for itself in it. It's simple as pie! | 16:12 |
notmyname | good morning | 16:13 |
zaitcev | ybunker: You have a very convoluted, strange configuration. I cannot diagnose it for you using just hearsay, sorry. | 16:13 |
ybunker | zaitcev: i know.. but the problem is that every config file looks good, so don't know where to look for | 16:14 |
ybunker | zaitcev: no problem, thanks for the tips :) | 16:14 |
zaitcev | ybunker: oh just strace thet replicator then and look what it opens and reads. Then md5sum on the file. It's a heavy-weight menthod, but then you'll see that it reads from a place where you didn't copy that ring.gz and it's reading something obsolete. Or you forgot the rebalance. Or heck, I saw people run rebalance, it errors out because min_hours is not reached, then they blindly copy the old ring.gz to node... | 16:15 |
tdasilva | ybunker: are you able to get recon info from the account servers on those ports? | 16:16 |
ybunker | zaitcev: will check on that and get back | 16:16 |
*** mrjk has quit IRC | 16:17 | |
zaitcev | Oh, actually, what tdasilva says. That gives the md5sum of the ring the server sees (well, hopefully it's the same as the one the replicator sees, because in the docker world you can't even be sure in that much, le sigh). | 16:17 |
tdasilva | something like: curl `http://10.1.1.11:4101/recon/devices` or something like that... | 16:19 |
tdasilva | and then try for the replication ip:port also | 16:19 |
zaitcev | 10.2.1.19:4103 in his case | 16:19 |
tdasilva | I always like to add the healthcheck middleware on the pipeline too, cause it'a quick way to just get a heartbeat on the service.... | 16:20 |
openstackgerrit | Tim Burke proposed openstack/swift master: Stop monkey-patching mimetools https://review.openstack.org/640552 | 16:20 |
*** pcaruana has quit IRC | 16:23 | |
*** pcaruana has joined #openstack-swift | 16:23 | |
ybunker | also i notice that some object disks are more usable in terms of % than others, so maybe something with the replication ring is going on | 16:30 |
*** e0ne has quit IRC | 16:41 | |
*** gyee has joined #openstack-swift | 16:46 | |
zaitcev | ybunker: maybe, but let's focus on the problem at hand, which is replicator not finding itself in the ring, according to your pastebin. Once you got them running solid on all nodes, then you can look at utilization. | 16:57 |
ybunker | zaitcev: yep | 16:57 |
*** e0ne has joined #openstack-swift | 16:57 | |
*** mrjk has joined #openstack-swift | 17:04 | |
openstackgerrit | Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520 | 17:05 |
*** e0ne has quit IRC | 17:13 | |
*** patchbot has quit IRC | 17:14 | |
*** patchbot has joined #openstack-swift | 17:14 | |
*** e0ne has joined #openstack-swift | 17:14 | |
zaitcev | timburke: I'm very sorry but I'm very confused! I went here... then grepped BaseMessage https://git.openstack.org/cgit/openstack/swift/tree/swift/common/wsgi.py?id=fac7d743db49858c17228f3ebb470948dae7cc23#n429 | 17:17 |
timburke | bah! i meant to switch *all* the BaseMessage stuff to wsgi.HttpProtocol.MessageClass... | 17:18 |
zaitcev | oh | 17:18 |
zaitcev | I thought it was some magic way to see some class methods or whatever | 17:19 |
openstackgerrit | Tim Burke proposed openstack/swift master: Stop monkey-patching mimetools https://review.openstack.org/640552 | 17:22 |
openstackgerrit | Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520 | 17:22 |
zaitcev | timburke: Why not use __super__? Just asking. | 17:29 |
timburke | zaitcev, py2 | 17:29 |
timburke | or, were you thinking super(..., self).blahblahblah? mimetools.Message doesn't inherit from object iirc | 17:30 |
zaitcev | Oh god | 17:31 |
*** e0ne has quit IRC | 17:31 | |
*** mikecmpbll has quit IRC | 17:32 | |
timburke | yeah, basically. :-( | 17:34 |
timburke | https://github.com/python/cpython/blob/v2.7.15/Lib/rfc822.py#L85 | 17:34 |
ybunker | mmm when i query the account.builder ring: account.builder, build version 260, id (not assigned) | 17:41 |
ybunker | that "id (not assigned)" is ok? | 17:41 |
ybunker | zaitcev: also i nithce that swift_container_server.log file is getting all 404 on PUTs on all the nodes | 18:01 |
ybunker | tdasilva: i have run the curl to recon devices and im getting the full list of disks: {"/srv/node": ["3", "1", "7", "6", "9", "4", "10", "11", "5", "2", "12", "8"]} | 18:08 |
tdasilva | ybunker: and you did that for all account servers ip:port combos? and replication combos? | 18:14 |
zaitcev | I'd start with checking 10.2.1.19:4301/recon/ringmd5 | 18:20 |
*** e0ne has joined #openstack-swift | 18:23 | |
*** e0ne has quit IRC | 18:25 | |
openstackgerrit | Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520 | 18:25 |
ybunker | zaitcev: would it be 10.2.1.19:4103 instead of 4301? | 18:28 |
zaitcev | ybunker: possibly. | 18:28 |
ybunker | zaitcev: ok, its the same for all the nodes | 18:30 |
ybunker | zaitcev: the md5 in the master node is the same on all the nodes | 18:36 |
ybunker | http://pasted.co/545a2c7a | 18:39 |
zaitcev | ybunker: do you still get the "swift-node09 account-replicator: Can't find itself" or has it stopped? | 18:56 |
ybunker | zaitcev: still on | 18:57 |
*** e0ne has joined #openstack-swift | 18:57 | |
ybunker | zaitcev: what if i configure the account-replicator for each of the disks inside /etc/swift/account-server/ and add rsync_module = 10.2.1.19::account4201 ... then account4202 and so on for the rest ? | 18:58 |
zaitcev | ybunker: okay, does that md5 match the one you get with md5sum /etc/swift/account.ring.gz (or what is the right path on the master or admin system)? | 18:58 |
ybunker | zaitcev: yes, its the same | 18:59 |
zaitcev | ybunker: and the timestamp of it is newer than the builder? Just with ls -lt /etc/swift/account* | 19:00 |
zaitcev | .ring.gz has to be on top of .builder | 19:00 |
ybunker | yes | 19:04 |
ybunker | zaitcev: if i look at the rsync logs i found : unknown module 'container' tried from swift-node01 (10.2.1.11) | 19:04 |
zaitcev | Yes, you have a ton of problems there. But they do not matter unless you get replicators actually working. | 19:05 |
ybunker | yep | 19:05 |
zaitcev | Using 1.conf 2.conf etc. on a multi-node setup is absurd. When such SAIO-like setup is in place, rsync must add port number to that thing, forgot what it calls it, maybe "module". There's an obscude setting for it that SAIO sets. It's called something like "vm_mode". BUT | 19:07 |
zaitcev | BUT it's much better junk those crazy 1.conf and 2.conf and just use a normal setup in production. Then you never have that "unknown module" things happening in rsync. | 19:07 |
zaitcev | Anyway | 19:08 |
ybunker | yeah, once i get this thing working i will then move and take some corrective actions on the configurations, but first i need to get this think to work :( | 19:09 |
zaitcev | Right | 19:09 |
zaitcev | oh | 19:14 |
zaitcev | I think I see it now | 19:14 |
zaitcev | In http://pasted.co/f931a29a, replicator says that it wants port 4103, right? But that port is only used for the access network, not for replication network. Look into http://pasted.co/70b963c5. All instances of 4103 are in the left column. | 19:16 |
zaitcev | And you have two sets of listeners, 3.conf listens on 4103 and r_3.conf listens on 4203. | 19:17 |
zaitcev | I can only conclude that your replicator uses 3.conf. I don't know how you start it, but it just does. Then, it sees it binding to 4103, then tries to find that among replication ports in the ring, and fails. | 19:18 |
zaitcev | You need to make sure that your replicator uses r_3.conf or whatever is the conf that its listener does. Edit some Systemd unit files or whatever. | 19:19 |
zaitcev | Or here's an even better suggestion to start | 19:20 |
zaitcev | Just don't use the replication network at first | 19:20 |
ybunker | mmm and how can i change that in order to work? can i try put it all on one config file instead of having 3 ? (is an inherited cluster :-( ) | 19:20 |
zaitcev | Get it all running with just one network, and have rings with port and replication_port set to same value at first. THen, as you got that debugged, add a distinct replication network. | 19:21 |
zaitcev | Oooh... | 19:21 |
zaitcev | So someone invented this mad config and dumped it on you. | 19:21 |
ybunker | yes!!! exactly :-( and its a pain in the... hahah | 19:22 |
zaitcev | So, how is the account replicator started? Is it swift-init, SystemD, or something else? | 19:23 |
ybunker | swift-init | 19:24 |
zaitcev | You have to find that out. There, it gets its argument... it's a path. That path must be such that it ends reading r_3.conf instead 3.conf. | 19:24 |
zaitcev | Hmm, wait. That one may be computing the paths. | 19:24 |
ybunker | and i see processes for 1.conf 2.conf 3.conf and r_1.conf r_2.conf and r_3.conf | 19:25 |
zaitcev | But it's mistaken when you have both maybe. Because it never expected that to happen. | 19:25 |
ybunker | is there a way to get rid of those r_x.conf files and just we one config file for acct, another for cont and one for obj ? | 19:26 |
ybunker | at least on the new server, and then i will try to change the rest of the cluster | 19:27 |
zaitcev | Yes, there is, but that effectively switches off the replication network and uses the front-to-back network for replication traffic. | 19:27 |
zaitcev | I'm afraid to suggest that outright, because who knows how much network capacity you're using right now. Merging them may bring the whole thing to its knees. | 19:28 |
zaitcev | Hold on, let me check if you can force replicator's port | 19:29 |
ybunker | thanks a lot zaitcev | 19:29 |
zaitcev | running vi etc/account-server.conf-sample, let's see.. | 19:29 |
zaitcev | nope, it's not possible | 19:31 |
ybunker | :-( | 19:31 |
zaitcev | ybunker: when you said "i see processes for ... r_3.conf", what does it actually include? Could it be that you have _two_ account replicators running: one for 3.conf, which does nothing, and one for r_3.conf? | 19:32 |
zaitcev | If so, then all you need is prevent the unnecessary and confusing extra replicator from starting. | 19:34 |
ybunker | zaitcev: http://pasted.co/d36d3110 | 19:34 |
zaitcev | Ah, yes, a full set :-) | 19:35 |
openstackgerrit | Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520 | 19:36 |
zaitcev | So, try to comment out [account-replicator] from 3.conf and do swift-init account-replicator stop, then start. Check if there aren't any bad messages. | 19:36 |
ybunker | ok, well errors seems to gone away :-) | 19:39 |
ybunker | but... still not replicating :( | 19:39 |
zaitcev | How do you know? | 19:40 |
zaitcev | number of passes remains at zero in recon? | 19:40 |
*** e0ne has quit IRC | 19:40 | |
zaitcev | Or some other method? | 19:40 |
*** e0ne has joined #openstack-swift | 19:40 | |
ybunker | looking at the log files for swift_account_server.log and listing the content of /srv/node/{1|2|3} does not appear any account directory | 19:40 |
ybunker | in the [account-replicator] i only have vm_test_mode = yes | 19:43 |
zaitcev | Right, that's how r_3.conf is set up. That setting appends "4203" to "account". | 19:43 |
zaitcev | So, actually, if you walk through all of ?.conf and comment out replicators, you should get rid of rsync complaining about the naked "container" module. | 19:44 |
zaitcev | Although | 19:44 |
zaitcev | That actually means that one of those extra container replicators managed to find itself in the ring and tried to replicate. | 19:45 |
ybunker | oh ok i see | 19:47 |
ybunker | and on the main account-replicator.conf | 19:48 |
zaitcev | Yeah, it's a little surprising that you have both that and account/1.conf | 19:48 |
ybunker | do i need to comment out also the [account-replicator] ? or leave it with log_facility, concurrency = 1 and vm_test_mode = yes ? | 19:48 |
zaitcev | I don't know for sure. You have to check if it owns any devices in the ring... | 19:50 |
zaitcev | If you do need it, then it must keep vm_test_mode, because that's how all those rsync listeners are set up. | 19:51 |
ybunker | the "unknown module 'container' tried from" errors continue inside the rsyncd.log | 19:52 |
zaitcev | You still have a container replicator without vm_test_mode somewhere in the cluster. Rsync should also tell you the IP of the offending node. | 19:53 |
ybunker | zaitcev: will review all the cluster and get back tomorrow hopefully with some news :), thanks a lot zaitcev for all the help today, really appreciated! | 19:54 |
*** ybunker has quit IRC | 19:56 | |
*** e0ne has quit IRC | 20:11 | |
*** e0ne has joined #openstack-swift | 20:12 | |
*** e0ne has quit IRC | 20:15 | |
*** itlinux has joined #openstack-swift | 20:30 | |
*** itlinux has quit IRC | 20:34 | |
openstackgerrit | Tim Burke proposed openstack/swift master: Get functional/tests.py running under py3 https://review.openstack.org/642520 | 20:38 |
*** e0ne has joined #openstack-swift | 20:56 | |
*** itlinux has joined #openstack-swift | 20:56 | |
*** e0ne has quit IRC | 21:04 | |
*** e0ne has joined #openstack-swift | 21:12 | |
*** e0ne has quit IRC | 21:38 | |
*** pcaruana has quit IRC | 21:45 | |
timburke | clayg, notmyname: you guys expressed some opinions on https://review.openstack.org/#/c/640552/ -- got some review bandwidth to second zaitcev's +2? | 21:49 |
patchbot | patch 640552 - swift - Stop monkey-patching mimetools - 5 patch sets | 21:49 |
*** itlinux has quit IRC | 21:55 | |
notmyname | timburke: so if I understand p 640552 correctly, we used to patch mimetools to set None so we can detect when the client doesn't set it. and now we do it with the `protocol_class` that's passed in to the wsgi server | 22:42 |
patchbot | https://review.openstack.org/#/c/640552/ - swift - Stop monkey-patching mimetools - 5 patch sets | 22:42 |
notmyname | although I'm not sure where the `protocol_class` thing gets used. eventlet maybe? | 22:43 |
notmyname | also a google search for "eventlet wsgi 'protocol_class'" only shows 6 results, 5 from swift's codebase | 22:46 |
*** threestrands has joined #openstack-swift | 22:50 | |
*** tkajinam has joined #openstack-swift | 22:56 | |
mattoliverau | morning | 22:58 |
zaitcev | I went directly to "git clone eventlet" to see how Tim chose to wedge this stuff in, seemed agreeable. | 23:07 |
notmyname | ya, he just explained it to me too. looks good when you finally see where it's all being called from | 23:10 |
timburke | notmyname, +1?? are you a core or not!? :P | 23:17 |
*** threestrands has quit IRC | 23:45 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!