Thursday, 2018-10-11

openstackgerritMerged openstack/swift master: Enabling direct_client users to overwrite X-Timestamp
openstackgerritMerged openstack/swift master: Refactor obj.server.ObjectController.PUT
openstackgerritMerged openstack/swift master: Add validation for sorting_method values
openstackgerritMerged openstack/swift master: Tighten status code assertions for object POST
openstackgerritMerged openstack/swift master: Allow EC to ignore invalid request ETag
openstackgerritMerged openstack/swift master: Pass empty etag override values through encrypter
DHEI don't know if this is a swift thing or an eventlet thing (or something else), has SO_REUSEADDR been considered for services?10:41
DHEso as a hack, SO_REUSEADDR is already in use so you can just launch, say, the object server multiple times to reap the benefits...12:23
DHEnotmyname: you may recall last week (?) I was having trouble with the reconstructor on my one-machine 30-disk lab box. This seems to have resolved it.12:58
notmynamegood morning15:59
timburkeDHE: i know there was a while ago...16:18
patchbotpatch 137659 - swift - Support of the Linux socket option SO_REUSEPORT (ABANDONED) - 8 patch sets16:18
timburkeand more recently16:19
patchbotpatch 312791 - swift - Remove unneeded setting of SO_REUSEADDR. (MERGED) - 2 patch sets16:19
timburke might also be kinda relevant?16:20
patchbotpatch 463876 - swift - Open-code eventlet.listen() (MERGED) - 2 patch sets16:20
DHEtimburke: reuseaddr is different from reuseport16:46
DHEit may help with scaling some issues16:46
DHE*some scaling issues16:47
DHEI'm trying it with the object server and things are much better16:47
DHEbut setting it is not enough. specific actions regarding socket creation are needed to take advantage.16:48
timburkebah -- sorry. i'm not actually super-familiar with socket options; just grabbing everything i kinda remember mentioning SO_REUSE*16:50
timburkei think we *do* have it set, though -- eventlet defaults it on for us:
timburkeso what are the other actions that would have to be taken?16:50
DHEthe point of SO_REUSEADDR is that it allows multiple sockets to be created each bind()ing to and listen()ing to the same IP/port.16:51
DHEthen the kernel will round robin incoming connections to each listen socket16:51
DHErather than having one listen socket and one accept() queue that multiple processes fight over (since poll() will wake them all up when any incoming connection arrives)16:52
DHEbut since the option is enabled, you can instead launch the object server with a low number of workers and just launch it multiple times to get the benefit16:52
DHEwhile I think all services could benefit, in my particular workload the object server is the one getting destroyed by load16:53
DHEdoes that make sense?16:54
DHEI guess the best description of "the action to be taken" is to fork() all your worker processes first, then have each one independently make its own listen socket with SO_REUSEADDR.16:57
DHErather than the traditional wisedom of making a listen socket, then forking your worker processes16:57
timburkeso i'm thinking about what happens when we find a non-partition in the replicator...18:28
timburkein particular, over around
timburkeif just *clearly not a partition* and the int() fails... we mark every device in the ring as failed?? that doesn't seem right...18:29
timburkebut if it's *close enough* to being a valid partition... we get back an empty list from get_part_nodes... which ends up flagging the current node as a handoff, replication trivially succeeds, and we *delete* whatever it was that we found18:30
timburke...which also doesn't seem great...18:31
timburkeso i guess my question is: what do we *want* to happen when you see a directory (or file!) named '1234' or '-12' or 'foo' when your part power says partitions should be in the range [0, 1023]?18:34
DHE1024 could happen if something goes wonky during a part power increase...18:36
notmynametimburke: I thought we had a check to simply pass over "weird" stuff when it's found. IOW an admin may have dropped a dir in there for whatever reason and we shouldn't try to parse it nor try to delete it. just ignore it18:37
notmynamebut in your case, you're talking about something that looks like a valid part number (naming-wise) but not valid according to the ring that's available18:37
DHEas an aside, what would happen if int() is given something like 0123. Are there two possible instances of the same partition?18:39
timburkenotmyname: yup. and fwiw, looks like reconstructor is more aggressive about non-int directories:
notmynamehmm.. that doesn't seem right18:40
timburkeDHE: seems like... and yeah, i'm a little worried about part-power increases, too18:40
DHE6 instances of the object-server (as launched from a shell) solves all my reconstructor random failures problems18:42
timburkealso fwiw, we apparently have tests to demonstrate that the replicator will delete invalid partitions:
notmynametimburke: a quick search didn't show the commit I was looking for where we ignore unexpected directories (I know it's somewhere though)18:48
timburkenotmyname: looks like the reconstructor behavior goes back to the start of EC:
notmynamedoesn't surprise me18:49
timburkeall i wanted to do was click +2/+A on but now i find myself trapped by ambivalence about when it's ok for swift to just *delete* data :-(19:13
patchbotpatch 402522 - swift - get_part_nodes should raise error on invalid part - 2 patch sets19:13
openstackgerritTim Burke proposed openstack/swift master: Allow arbitrary UTF-8 strings as delimiters in container listings
timburkei should probably do ^^^ for account servers, too, but one step at a time22:36
openstackgerritTim Burke proposed openstack/swift master: Add slo_validator callback
