*** dviroel|out is now known as dviroel | 11:29 | |
timss | Hi, my object nodes have a 2xSSD (RAID1 w/LVM for OS) and 64xHDD setup and I'm looking into possibly seperating account/container onto leftover space of the SSD RAID for extra performance. With our previous growth the space requirement for containers seems to be fine (~85G per server), and I've been able to configure the rings accordingly. | 13:10 |
---|---|---|
timss | Question is; with only 1 device (logical volume) per object node (atleast 7), will it be enough devices for a healthy distribution, and how would one go about deciding part power etc.? Wasn't able to find any references online, although I think I've heard people run similar setups (albeit maybe with a more significant number of devices for the account/container rings) | 13:10 |
opendevreview | Alistair Coles proposed openstack/swift master: relinker: tolerate existing tombstone with same X-Timestamp https://review.opendev.org/c/openstack/swift/+/798849 | 13:10 |
DHE | as long as you have more devices than you have distributed copies you're fine. the concern comes when you add additional redundancy into the system, like multiple failure zones (racks) | 13:42 |
DHE | at 7 servers I'm guessing they're all in the same rack connected to the same switch? | 13:42 |
timss | As of this time unfortunately yes, all in the same rack | 13:42 |
timss | At least there's redundant networking and power, but it's not optimal for sure | 13:43 |
DHE | so redundancy concerns where you're giving some topology information to swift become more serious with only 7 copies and, say, 2 failure zones | 13:45 |
DHE | *7 servers | 13:45 |
timss | In this scenario there's no real difference between the servers inside the same rack so even defining a clear failure domain is a bit tricky. From my understanding even with 1 region and 1 zone, Swift would at least ensure all 3 replicas will be spread on different servers (and their partitions). Not sure if splitting it up would help much, or? | 13:53 |
zaitcev | Swift spreads partitions in tiers. First to each region, then to each zone, then to each node, and finally to each device. | 14:23 |
zaitcev | This allows to assign zones to natural failure boundaries, such as racks. | 14:24 |
zaitcev | But each tier can be degenerate: 1 region total, 1 zone total, etc. | 14:24 |
zaitcev | So, 7 nodes for replication factor 3 sounds fine to me. | 14:25 |
zaitcev | Gives space to handoff nodes beyond the strictly necessary 3 in the node tier. | 14:27 |
timss | Cheers to both, I feel like I can live with this setup, and if growth is to continue I would perhaps introduce another zone or region at some point, but for it is what it is, and the application of this cluster should be fine with the level of redundancy set, was more worried about the very low amount of devices than anything | 14:29 |
zaitcev | Is your replication factor 3 for the container and account rings? | 14:30 |
timss | That's the plan | 14:30 |
zaitcev | Sounds adequate to me. | 14:31 |
timss | Next up would be to decide the partition power for the account/container rings. I've usually seen at least 100-1000 partitions per device be recommended, but clayg's pessimistic part power even recommends as much as ~7k at a part power of 14. Perhaps the general recommendations doesn't play well with the very low amount of devices, but dunno | 14:33 |
clayg | the recommendation was more "on the order of 1K" - so 2-3, maybe 5-6 is fine but >10 starts to look sketchy even if you are "planning" for some growth | 14:35 |
clayg | now that I have more experience with part power increase I wonder if my recommendations about picking a part power may have changed (for objects at least; AFAIK no one has attempted a a/c PPI) | 14:36 |
zaitcev | I never understood economizing at partitions. The more, the better for the replication. The biggest clusters can have issues like having too many i-nodes, which auditors and replicators constantly refresh in the kernel. If you have adequate RAM to contain the inodes and the rings, what's the downside? | 14:37 |
zaitcev | Is there a problem with replicator passes taking too long? | 14:38 |
timss | oh, seems I summoned the man himself involuntarily :D | 14:38 |
timss | back to object rings primarly now, but I'm curious what made some folks over at Rackspace recommend more in the scale of ~200 partitions per drive in their ring calculation tool https://rackerlabs.github.io/swift-ppc/ | 14:46 |
timss | I've been running ~6k partitions per device (pp 19) on a previous installation for years which has been going ok, but replication performance isn't the best (probably more factors to it, it hasn't gotten that much love) | 14:48 |
opendevreview | Alistair Coles proposed openstack/swift master: relinker: don't bother checking for previous tombstone links https://review.opendev.org/c/openstack/swift/+/798914 | 15:10 |
opendevreview | Hitesh Kumar proposed openstack/swift-bench master: Migrate from testr to stestr https://review.opendev.org/c/openstack/swift-bench/+/798941 | 18:13 |
timburke | anybody care much about swift-bench? looks like ~a year ago i proposed we drop py2 for it: https://review.opendev.org/c/openstack/swift-bench/+/741553 | 19:24 |
*** dviroel is now known as dviroel|out | 20:41 | |
zaitcev | I would not mind. It's a client, isn't it? Surely new test runs for it run on new installs. No data gravity. | 20:45 |
opendevreview | Tim Burke proposed openstack/swift master: reconciler: Tolerate 503s on HEAD https://review.opendev.org/c/openstack/swift/+/796538 | 20:45 |
zaitcev | Well I can imagine benching from an ancient kernel in case there's an anomaly in a new one. | 20:46 |
zaitcev | But frankly I suspect the time for that is in the past. | 20:46 |
kota | good morning | 20:56 |
timburke | o/ | 20:57 |
kota | timburke: o/ | 20:58 |
timburke | #startmeeting swift | 21:00 |
opendevmeet | Meeting started Wed Jun 30 21:00:37 2021 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
opendevmeet | The meeting name has been set to 'swift' | 21:00 |
timburke | who's here for the swift meeting? | 21:00 |
kota | o/ | 21:01 |
acoles | o/ | 21:01 |
timburke | pretty sure mattoliver is out sick -- we'll see if clayg and zaitcev end up chiming in later ;-) | 21:03 |
zaitcev | o/ | 21:04 |
timburke | as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift | 21:04 |
timburke | #topic swift-bench and py2 | 21:04 |
timburke | so a while back i proposed that we drop py2 support from swift-bench: https://review.opendev.org/c/openstack/swift-bench/+/741553 | 21:05 |
timburke | ...and then i promptly forgot to push on getting it merged at all :P | 21:05 |
timburke | i saw that there's a new patch up for swift-bench (https://review.opendev.org/c/openstack/swift-bench/+/798941) -- and the py2 job seems broken | 21:06 |
kota | i see. it's updated in Jul 2020 | 21:06 |
timburke | so i thought i'd check in to see whether anyone objects to dropping support there | 21:07 |
timburke | sounds like i'm good to merge it :-) | 21:09 |
kota | +1 | 21:10 |
timburke | on to updates! | 21:10 |
timburke | #topic sharding | 21:10 |
timburke | it seems like acoles and i are getting close to agreement on https://review.opendev.org/c/openstack/swift/+/794582 to prevent small tail shards | 21:11 |
timburke | were there any other follow-ups to that work we should be paying attention to? or other streams of work related to sharding? | 21:11 |
acoles | IIRC mattoliver had some follow up patch(es) for tiny tails but I don't recall exactly what | 21:13 |
acoles | maybe to add an 'auto' option, IDK | 21:14 |
timburke | sounds about right. and there's the increased validation on sharder config options -- https://review.opendev.org/c/openstack/swift/+/797961 | 21:16 |
timburke | i think that's about it for sharding -- looking forward to avoiding those tail shards :-) | 21:17 |
timburke | #topic relinker | 21:17 |
timburke | we (nvidia) are currently mid part-power increase | 21:18 |
timburke | and acoles wrote up https://bugs.launchpad.net/swift/+bug/1934142 while investigating some issues we saw | 21:18 |
timburke | basically, the reconciler has been busy writing out tombstones everywhere, which can cause fomr relinking errors as multiple reconcilers can try to write the same tombstone at the same time | 21:20 |
acoles | we're fortunate that the issue has only manifested with tombstones, as a result of the circumstances of the reconciler workload we had and the policy for which we were doing part power increase | 21:21 |
zaitcev | Oh I see. I was just thinking about it. | 21:21 |
acoles | its relatively easy to reason about tolerating a tombstone with different inode, data files would probably require more validation that 'same filename' | 21:22 |
timburke | a fix is currently up at https://review.opendev.org/c/openstack/swift/+/798849 that seems reasonable, with a follow-up to remove some now-redundant checks at https://review.opendev.org/c/openstack/swift/+/798914 | 21:22 |
acoles | timburke: if we feel happy about the follow up I reckon I should squash the two | 21:23 |
acoles | we're basically relaxing the previous checks rather than adding another | 21:24 |
timburke | i think i am, at any rate. i also think i'd be content to skip getting the timestamp out of metadata | 21:24 |
acoles | yeah, that was my usual belt n braces :) | 21:25 |
timburke | surely the auditor includes a timestamp-from-metadata vs timestamp-from-file-name check, right? | 21:26 |
acoles | idk | 21:26 |
acoles | ok i'll rip out the metadata check and squash the two | 21:27 |
timburke | 👍 | 21:28 |
timburke | #topic dark data watcher | 21:28 |
timburke | i saw acoles did some reviews! | 21:28 |
zaitcev | Yes | 21:28 |
timburke | thanks :-) | 21:28 |
acoles | yes! | 21:28 |
zaitcev | Indeed. | 21:28 |
acoles | well just one | 21:28 |
acoles | iirc i was happy apart from some minor fixes | 21:29 |
zaitcev | I squashed that already but now I'm looking at remaining comments, like the one about when X-Timestamp is present and if an object can exist without one. | 21:30 |
acoles | zaitcev: i think its ok, the x-timestamp should be there if the auditor passes the diskfile to watcher | 21:31 |
timburke | and if the auditor *doesn't* check for it, it *should* and idk that the watcher necessarily needs to be defensive against it being missing | 21:32 |
zaitcev | ok | 21:33 |
timburke | all right, that's all i had to bring up | 21:34 |
timburke | #topic open discussion | 21:34 |
timburke | what else should we be talking about? | 21:34 |
zaitcev | Hackathon :-) | 21:35 |
timburke | i love that idea -- unfortunately, i don't think it's something we can do yet | 21:38 |
timburke | short of a virtual one, at any rate | 21:38 |
kota | exactly | 21:39 |
opendevreview | Pete Zaitcev proposed openstack/swift master: Make dark data watcher ignore the newly updated objects https://review.opendev.org/c/openstack/swift/+/788398 | 21:39 |
timburke | speaking of -- looks like we've got dates for the next PTG: http://lists.openstack.org/pipermail/openstack-discuss/2021-June/023370.html | 21:39 |
timburke | Oct 18-22, still all-virtual | 21:39 |
acoles | ack | 21:40 |
* kota will register it | 21:41 | |
zaitcev | I'm just back from a mini vacation at South Padre. Seen a few people in masks. Maybe one in 20. | 21:41 |
timburke | yeah, but you're in TX ;-) | 21:42 |
zaitcev | The island is overflowing. I guess the international vacationing still not working. People even try to surf, although obviously the wave is pitiful in the Gulf absent a storm. | 21:42 |
timburke | i just check; my company's guidelines for travel are currently matching their guidelines for office re-opening, which is "not yet" | 21:42 |
zaitcev | ok | 21:43 |
timburke | all right, let's let kota get on with his morning :-) | 21:43 |
acoles | is the us even allowing aliens in ? without quarantine? | 21:43 |
timburke | thank you all for coming, and thank you for working on swift! | 21:44 |
timburke | #endmeeting | 21:44 |
opendevmeet | Meeting ended Wed Jun 30 21:44:23 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 21:44 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2021/swift.2021-06-30-21.00.html | 21:44 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2021/swift.2021-06-30-21.00.txt | 21:44 |
opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2021/swift.2021-06-30-21.00.log.html | 21:44 |
timburke | acoles, it looks like they probably wouldn't let you in: https://www.cdc.gov/coronavirus/2019-ncov/travelers/from-other-countries.html :-( | 21:48 |
clayg | sorry i missed the meeting; scrollback all looks good 👍 | 21:58 |
opendevreview | Merged openstack/swift-bench master: Drop testing for py27 https://review.opendev.org/c/openstack/swift-bench/+/741553 | 23:54 |
opendevreview | Tim Burke proposed openstack/swift-bench master: Switch to xena jobs https://review.opendev.org/c/openstack/swift-bench/+/741554 | 23:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!