timburke | if you don't have the disk yet and know you won't for a while, you can just remove the device from any rings it was participating in and rebalance to reassign its partitions. replicators will have some work to do, but after a cycle or two everything should settle a bit | 00:02 |
---|---|---|
*** baojg has joined #openstack-swift | 00:03 | |
timburke | even without the ring change, objects should be fully durable as long as you unmount the drive - replication should put an extra copy on the first handoff | 00:03 |
donnyd | I have a replacement (or 20) on hand | 00:05 |
donnyd | I have 3 swift servers.. but with my current air handler can only have two of them on | 00:05 |
donnyd | which is a bummer... my old air handler let me have 6 of them turned on without an issues | 00:06 |
*** hoonetorg has joined #openstack-swift | 00:06 | |
donnyd | just don't want to blow anything up.. I can make a reasonable assumption people are actually using the data in FN for real things... and its integrity is a priority for me | 00:07 |
*** gyee has quit IRC | 00:07 | |
*** baojg has quit IRC | 00:08 | |
timburke | entirely reasonable. replacing the drive should be very smooth | 00:10 |
timburke | oh hey! https://docs.openstack.org/swift/latest/admin_guide.html#handling-drive-failure | 00:10 |
timburke | kinda sounds like we recommend removing the failed device, rebalancing, then adding the replacement as new... which is a bit different than i was expecting; i wonder if rledisez or alecuyer have any insight on what works best for them... | 00:13 |
*** baojg has joined #openstack-swift | 00:24 | |
*** NM has quit IRC | 00:28 | |
mattoliverau | cant you also use `swift-recon --expirer` to manually check. or even hit your nodes exirer API to get info. It's also what my patches to monasca does.. now if they'd only land. | 00:32 |
donnyd | thanks timburke | 00:33 |
*** BjoernT has joined #openstack-swift | 01:40 | |
*** BjoernT has quit IRC | 01:43 | |
openstackgerrit | Matthew Oliver proposed openstack/swift master: Auto-sharding: Initial steps https://review.opendev.org/667030 | 03:28 |
mattoliverau | just a rebase (from the UI) let's hope it worked :) | 03:28 |
*** tkajinam has joined #openstack-swift | 03:49 | |
openstackgerrit | Matthew Oliver proposed openstack/swift master: Auto-sharding: Initial steps https://review.opendev.org/667030 | 03:54 |
openstackgerrit | Matthew Oliver proposed openstack/swift master: sharding: first attempt at _elect_leader https://review.opendev.org/667579 | 03:54 |
openstackgerrit | Matthew Oliver proposed openstack/swift master: auto-sharding: send shard-ranges via container UPDATE https://review.opendev.org/672650 | 03:54 |
mattoliverau | This time I did the rebase myself ^^ | 03:55 |
timburke | man, we really ought to try to standardize our recon dumps... | 04:00 |
timburke | just looking for basic, common things like cycle time and end of last cycle, i put together http://paste.openstack.org/show/781847/ | 04:01 |
timburke | several daemons don't meaningfully capture any of this data (account reaper, container reconciler, container sync, object auditor) | 04:02 |
timburke | something like half the time, cycle-time names include account/container/object, despite it going to a file like object.recon | 04:03 |
timburke | "pass completed", "sweep", "time", "pass" all mean approximately the same thing | 04:04 |
mattoliverau | yeah, it being standardised would be awesome, and would simplify p 583876 so it could be alot more generic, and maintainable | 04:04 |
patchbot | https://review.opendev.org/#/c/583876/ - monasca-agent - Add swift_recon check plugin to monasca - 1 patch set | 04:05 |
timburke | poor ho never did come back to https://review.opendev.org/#/c/270014/ | 04:05 |
patchbot | patch 270014 - swift - Fix time unit of Recon's replication_time for object - 1 patch set | 04:05 |
mattoliverau | speaking of which I should go give that patch some love now that someone has actaully reviewed it from monasca :P | 04:05 |
mattoliverau | :( | 04:05 |
timburke | \o/ reviews! progress! | 04:06 |
timburke | at least we're pretty consistent about using "last" to mean the timestamp for last cycle completion | 04:10 |
timburke | but then we have "expired_last_pass" which tracks the number of objects actually deleted | 04:11 |
timburke | i can see how that makes sense. but taken as a whole, it's definitely confusing | 04:11 |
mattoliverau | it's interesting, you can kinda see the evolution. All the resplicators or daemons based off replicators are the ones that have the _last for the end time. All daemons runningo n cycles really should have a last, a cycle time is good too, but knowing when it's comlete kinda feels like a must. | 04:17 |
mattoliverau | But guess I don't have to OP Swift clusters as much as I'd like, so maybe I'm wrong. | 04:18 |
*** fungi has quit IRC | 04:21 | |
timburke | mattoliverau, i agree entirely. without the last, you have no idea how stale that cycle time is :-/ | 04:22 |
mattoliverau | Well lucky for us adding missing items is easier then renaming. | 04:22 |
timburke | :D | 04:23 |
*** fungi has joined #openstack-swift | 04:26 | |
*** psachin has joined #openstack-swift | 05:59 | |
*** ccamacho has quit IRC | 06:35 | |
*** rcernin has quit IRC | 07:05 | |
*** tesseract has joined #openstack-swift | 07:10 | |
baffle | tdasilva: That's pretty good timing. I just readded all devices, worked great. 😁 | 07:13 |
*** rdejoux has joined #openstack-swift | 07:24 | |
*** ccamacho has joined #openstack-swift | 07:26 | |
*** ccamacho has quit IRC | 07:27 | |
*** ccamacho has joined #openstack-swift | 07:27 | |
alecuyer | timburke: we keep spare disks ready on some machines in each cluster. When a disk fails the ring gets changed to point at a spare. then there's time to replace the failed disk | 07:27 |
*** mvkr has quit IRC | 07:30 | |
*** mvkr has joined #openstack-swift | 07:43 | |
*** mikecmpbll has joined #openstack-swift | 08:01 | |
*** mvkr has quit IRC | 08:21 | |
*** tkajinam has quit IRC | 08:30 | |
*** e0ne has joined #openstack-swift | 08:31 | |
*** mvkr has joined #openstack-swift | 08:34 | |
*** rpittau|afk is now known as rpittau | 08:38 | |
*** e0ne has quit IRC | 08:52 | |
*** e0ne has joined #openstack-swift | 09:13 | |
*** pcaruana has joined #openstack-swift | 09:30 | |
*** rcernin has joined #openstack-swift | 10:53 | |
*** tomha has joined #openstack-swift | 12:06 | |
*** tomha has quit IRC | 12:20 | |
*** NM has joined #openstack-swift | 12:34 | |
*** pcaruana has quit IRC | 12:53 | |
*** rcernin has quit IRC | 13:19 | |
*** mikecmpbll has quit IRC | 13:33 | |
*** diablo_rojo has joined #openstack-swift | 14:01 | |
*** NM has quit IRC | 14:06 | |
*** NM has joined #openstack-swift | 14:19 | |
*** pcaruana has joined #openstack-swift | 14:46 | |
*** gyee has joined #openstack-swift | 15:21 | |
*** rpittau is now known as rpittau|afk | 15:47 | |
timburke | good morning | 15:52 |
*** e0ne has quit IRC | 15:55 | |
*** tesseract has quit IRC | 16:08 | |
*** rdejoux has quit IRC | 16:18 | |
clayg | hot spare drives!!! | 16:38 |
clayg | timburke: do you have any idea how these tests are succesfully using the null byte in query args -> https://review.opendev.org/#/c/682138/8/test/unit/account/test_server.py | 16:39 |
patchbot | patch 682138 - swift - Allow internal clients to use null namespace - 8 patch sets | 16:39 |
clayg | but I can't seem to get it to work in the like filter? | 16:40 |
clayg | i guess it might not be the query args code - but instead the code handling like that barfs 🤔 | 16:43 |
timburke | yeah, grabbing sqlite source now... | 16:43 |
clayg | "The result of expressions involving strings with embedded NULs is undefined." Fuh. http://www.sqlite.org/c3ref/bind_blob.html | 16:54 |
timburke | whoa | 16:58 |
timburke | > The sqlite3_create_function() interface can be used to override the like() function and thereby change the operation of the LIKE operator. | 16:58 |
timburke | i don't think we use LIKE anywhere else.... hmm.... | 16:59 |
openstackgerrit | Thiago da Silva proposed openstack/swift master: WIP: New Object Versioning mode https://review.opendev.org/682382 | 17:00 |
timburke | whee! https://sqlite.org/lang_corefunc.html#quote | 17:01 |
timburke | > Strings with embedded NUL characters cannot be represented as string literals in SQL and hence the returned string literal is truncated prior to the first NUL. | 17:01 |
timburke | makes me wonder whether the prefix tests are actually testing everything we want... | 17:07 |
clayg | yes, i 100% agree - i feel like if that acctually what was happening I'd be able to demonstrate it trivially with these marker and prefix tests - but they're *working* | 17:07 |
clayg | i'm so confused | 17:07 |
clayg | well, not that confused - i mean all the documentation is telling me "stop; don't do this; it's not supported; you'll end up maintaining sqlite" - but I'm like *we're so CLOSE!!!* | 17:08 |
clayg | I'm also looking at if there's anything we could do with that range of bytes that's our weird outlawed utf8 i.e. '%d8' | 17:09 |
timburke | it's gonna get harder/weirder -- the sorting isn't going to be in our favor | 17:11 |
timburke | the beautiful thing about NUL was that it's *so early* owhen sorting | 17:11 |
timburke | *maybe* the separation between archive and primary containers can save us a bit? like, store with some non-utf8 byte, then replace all of them with nulls in time for us to do our interleaving? idk... feels like the elegance is slipping away... | 17:14 |
clayg | yup | 17:17 |
clayg | 😭 | 17:17 |
*** ccamacho has quit IRC | 17:29 | |
*** lbragstad has joined #openstack-swift | 17:32 | |
timburke | clayg, good find on the set_trace_callback() func -- defintiely helpful as i play with this. but i'm starting to wonder how well it works, in light of the other logging issues i've seen... | 17:36 |
timburke | in particular, if i drop a self.fail() at the end of test_prefix_with_null(), i see a query like | 17:37 |
timburke | WHERE name < 'null' AND name >= 'null' AND deleted = 0 | 17:37 |
timburke | which really shouldn't return anything | 17:37 |
lbragstad | o/ hi folks - i'm having some difficulty generating a temp url, but i think i'm following all the right steps, at least based on what i found in documentation (this is what i've done so far: https://pasted.tech/pastes/88360ef2441f66fc3be37d3afbce7335ffca5f46.raw ) | 17:39 |
timburke | clayg, are we *sure* we can't claw back the \x01-\x08 namespace, similar to how we grabbed the leading . in the account namespace? | 17:40 |
timburke | lbragstad, is delay_auth_decision enabled in the auth_token middleware? i don't think tempurl works without it | 17:42 |
lbragstad | timburke good question - let me check quick | 17:43 |
timburke | pretty sure other features will break, too -- staticweb, formpost, anonymous access... | 17:44 |
lbragstad | ok - interesting... i am noticing 401s in my swift.log for other requests (service-to-service), too | 17:45 |
lbragstad | i don't see delay_auth_decision set in /etc/swift/proxy-server.conf | 17:46 |
lbragstad | but i do see ksm's auth_token middleware in the pipeline | 17:46 |
timburke | lbragstad, default is false; i think you'll need to explicitly enable it | 17:47 |
lbragstad | ok - i'm seeing several configs, but i assume proxy-server.conf is the one i need to edit? | 17:48 |
timburke | yep; all auth decisions are handled at the proxy (for better or worse...) | 17:49 |
lbragstad | ok - i enabled that an bounced all the swift service, still no luck though ( i generated a new tempurl with swift tempurl and used curl directly) | 17:53 |
lbragstad | s/an/and/ | 17:53 |
lbragstad | new paste https://pasted.tech/pastes/452d4abb2701053fe2c22926c7b43fea57c7d9e1.raw | 17:55 |
lbragstad | i used `swift post -m "Temp-URL-Key:MYKEY"` earlier to set my key, and that appears to have worked because i can see it when i list my account information | 18:00 |
timburke | oh! just noticed that the tempurl was generated for a GET, but then you used it for a PUT... mind trying it as a GET (or HEAD)? | 18:06 |
lbragstad | from what i can see in https://docs.openstack.org/api-ref/object-store/?expanded=create-or-replace-object-detail,list-activated-capabilities-detail,show-account-details-and-list-containers-detail#create-or-replace-object and https://docs.openstack.org/swift/latest/api/temporary_url_middleware.html that should be all i need to generate a temp url, right? | 18:06 |
lbragstad | so - that was my next question :) | 18:07 |
lbragstad | i was wondering if the `swift tempurl` bit was supposed to take the method you intended to use or the method that's actually used in the request | 18:07 |
lbragstad | if i use `swift tempurl GET` to generate a tempurl for a GET request (allowing temporary access to a tempurl) - how do you set that on an object? | 18:08 |
lbragstad | timburke ack - setting PUT in the tempurl worked... https://pasted.tech/pastes/cf77d04cac53a93dd09c8701a03338c43333c172.raw | 18:10 |
timburke | nothing is stored the object -- it's just a decision made based on the account or container metadata (to get the key), request method, request path, request expiration, and server timestamp | 18:10 |
timburke | 👍 | 18:10 |
lbragstad | ok - so the method to get the signature doesn't attribute to access in the server? | 18:11 |
lbragstad | e.g., using `swift tempurl PUT` will still allow people with the tempurl to get the contents of that object using `curl -X GET $tempurl`? | 18:12 |
timburke | no, a PUT tempurl won't let you GET. it *will* let you HEAD, though (i suppose, so you can check whether the upload's already been completed?) | 18:14 |
lbragstad | aha | 18:15 |
lbragstad | here i was trying to use `swift tempurl` to generate tempurls to _create_ tempurls | 18:15 |
lbragstad | so - i think that's where my hangup was | 18:15 |
lbragstad | because i was using it to generate requests i wanted to make in the future (e.g., i want temporary access to allow people to GET this thing) | 18:16 |
lbragstad | and then i tried putting that into a PUT request to _create_ that URL... but the signatures obviously won't match | 18:16 |
timburke | cool! yeah, the one tempurl should be enough, then. though i suppose there may be some value in varying the expiry slightly for fingerprinting... | 18:18 |
lbragstad | ok - that's what i was wondering because i have a deployment with tempurls that expire after a year | 18:18 |
timburke | hmm... that long of a window will likely make it hard to rotate keys... | 18:19 |
lbragstad | so - i was trying to figure out how expiration was set on a tempurl for a object and i assumed it was something you set on the put request when you created that pbject | 18:19 |
clayg | timburke: i was also thinking about trying to claim back some of the lower byte namespace... i was curious where the s3 allowed names bottom out... part of me is scared I'll end up suggesting a v2 api and restricting object versioning to only that and s3api | 18:19 |
lbragstad | or is temp_url_expires not a settable thing on an object - was i interpreting that wrong? | 18:23 |
timburke | clayg, i know we've got some logic in s3api to get headers back and forth between quoted-printable (see https://tools.ietf.org/html/rfc1521.html) ... idk about name restrictions, though | 18:25 |
timburke | lbragstad, temp_url_expires is purely a property of the request -- nothing gets stored with the object | 18:26 |
lbragstad | aha | 18:26 |
timburke | proxy uses it; i don't think object server even sees it | 18:26 |
lbragstad | here i was thinking you could set the expiration of it | 18:26 |
lbragstad | s/it/the temp url/ | 18:27 |
timburke | so: there's no way of knowing what tempurls have been generated for a particular object | 18:27 |
lbragstad | if temp url expiration (TTL) is only a thing clients send, how does the server use it? | 18:27 |
lbragstad | i guess i'm trying to understand the usecase | 18:28 |
timburke | server needs it (1) to calculate the same signature as the client provided -- if the expiry doesn't match what was used when the client created the signature, the signature can't match -- and (2) to compare against the server time -- if the server time is past the expiry, it's ipso facto invalid | 18:30 |
clayg | timburke: 😬 https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-key-guidelines-special-handling | 18:31 |
lbragstad | ok - so temp_url_expired doesn't limit access at all | 18:32 |
timburke | clayg, "likely need to be URL encoded" doesn't give me much hope... looks like we need to test | 18:33 |
clayg | yeah, i'm working on that now | 18:33 |
timburke | lbragstad, what do we mean be "limit access"? after an hour (give or take, if the client's reasonably in-sync with the cluster), either of those tempurls would be invalid and attempting to use them would yield only 401s | 18:36 |
lbragstad | oh - it's a threshold? | 18:36 |
lbragstad | again - i'm sorry, i have my wires cross and i'm thinking about expiration differently | 18:38 |
lbragstad | crossed* | 18:38 |
timburke | yes -- you can test this by using --absolute to specify a date in the past | 18:38 |
timburke | heh, sorry -- we further the confusion a bit with our tempurl expiration and object expiration -- which are entirely orthogonal concepts | 18:39 |
lbragstad | i was confused because i thought 1.) temp_url_expires was set somewhere on the object 2.) if i set it to something like a year in the future, in 366 days i won't be able to use that temp_url anymore | 18:39 |
timburke | 2 is true, 1 is not | 18:39 |
clayg | so using boto3 at least giving it quoted names just resulted in objects *named* '%00' and '%FF' etc | 18:41 |
clayg | giving it the bytes worked (in the listing "Key": "\u0001") - but for \x00 i get an error -> An error occurred (400) when calling the PutObject operation: Bad Request Unable to create key '\x00' | 18:42 |
timburke | (and if you rotate your keys monthly, say, you won't even be able to use the tempurl for the full year) | 18:42 |
timburke | clayg, i'm getting more and more curious about what the actual bytes-on-the-wire look like... | 18:43 |
lbragstad | timburke ok - interesting, i think i get it now... | 18:44 |
timburke | lbragstad, any feedback on what we could say in docs to make the mental-model more obvious? improvements always welcome and all ;-) | 18:45 |
lbragstad | so the temp_url usage is obviously underpinned by the key used to sign it and the expiration it was given when the temp_url was "created" | 18:45 |
lbragstad | created == signed with entropy (from uuid) | 18:46 |
lbragstad | timburke yeah - i'll thinking about this a bit more and re-read the docs | 18:46 |
lbragstad | timburke clayg thanks for the help, i really appreciate it | 18:47 |
timburke | thanks :-) | 18:47 |
timburke | lbragstad, "with entropy" and "from uuid" give me some pause, though -- do you mean from the object name? or from the key? or from something else? | 18:48 |
lbragstad | timburke nevermind - https://github.com/openstack/python-swiftclient/blob/2fcd4d872713dc30e7352845c37515280f1d21ab/swiftclient/utils.py#L179 | 18:52 |
lbragstad | i didn't fully read that | 18:52 |
timburke | in all fairness, you shouldn't need to *read the source* to understand what's going on ;-) | 18:53 |
lbragstad | after rereading the temp_url_sig - it's clear | 18:54 |
timburke | clayg, this kinda makes me wish we had https://review.opendev.org/#/c/212824/ -- it should be pretty easy to write an audit-watcher that just scribbles down names that include chars in the \x01-\x08 range.... | 19:04 |
patchbot | patch 212824 - swift - Let developers/operators add watchers to object audit - 12 patch sets | 19:04 |
clayg | timburke: it's not obvious to me how that would be helpful... just for like finding out if such names exist? | 19:31 |
*** psachin has quit IRC | 19:33 | |
timburke | clayg, yeah, mainly just having something we could have rledisez (for example) run to see if this even passes the sniff test | 19:36 |
rledisez | timburke: sure, if you want us to scan a bit our disks, just tell us (no need to merge the patch, it can even be a quick&dirty script with enough security to not sucks all the IO) | 19:41 |
clayg | timburke: well the problem is also that s3 allows these characters in key names - everything except \x00 | 19:43 |
timburke | clayg, good to know -- but how many s3 *clients* use them? | 19:44 |
dcourtoi | hello, should we consider that not being able to use hostnames in rings instead of IP addresses while using servers_per_port is a bug ? | 19:54 |
dcourtoi | (it works if servers_per_port = 0) | 19:55 |
clayg | dcourtoi: i'm a little surprised it works with servers_per_port=0, i'm curious what breaks regardless - do you have a stack trace or something? | 20:02 |
dcourtoi | I don't have any stack trace when I'm not using the servers_per_port feature, it works. But when I enable servers_per_ports, the ip/hostnames are in the ring are compared to what common.utils.whataremyips() returns, and it always returns IP addresses. So if we put hostnames in the ring the object-server process hangs without logging anything, indifinetly looking for a match between the hostname | 20:12 |
dcourtoi | and whataremyips return value | 20:12 |
*** e0ne has joined #openstack-swift | 20:23 | |
*** e0ne has quit IRC | 20:28 | |
clayg | dcourtoi: i bet the issue is either IN common.ring.utils.is_local_device or it's just the cardinality of calling socket.getaddrinfo | 20:30 |
*** pcaruana has quit IRC | 20:35 | |
clayg | dcourtoi: like on my machine whataremyips returned different values that socket.getaddrinfo - it's possible that with some configuration tweaking it could be made to work | 20:40 |
clayg | i think the reason it's not better "supported" with betters docs validation, and error messages is because not many folks have tried to set things up this way - if you can get it working it'd be much easier to know what bug to open and how to fix it | 20:41 |
dcourtoi | for what I saw whataremyips always returns IP addresses, and in common.storage_policy dev['ip'] are compared to those IP addresses. I was able to make the object server start by forcing whataremyips to return the hostname when the hostname resolution failed (when socket.gaierror.errno == -2). I'll contunue digging tomorrow | 20:48 |
dcourtoi | the hostname resolution fails because socket.AI_NUMERICHOST is passed to socket.getaddrinfo in whataremyips() | 20:51 |
dcourtoi | to be continued... | 20:52 |
*** NM has quit IRC | 21:08 | |
openstackgerrit | Tim Burke proposed openstack/swift master: WIP: New Object Versioning mode https://review.opendev.org/682382 | 21:08 |
clayg | oic, somewhere we look at the ips instead of just is_local_device | 21:15 |
clayg | timburke: i'm extracting the specific character we use to a constant RESERVED_BYTE and updating tests and code | 21:15 |
clayg | i'm not sure that's helpful - but my assumption is we may decide "fuck it we'll just use \x01\x01 | 21:16 |
clayg | 😞 | 21:16 |
*** NM has joined #openstack-swift | 21:22 | |
*** NM has quit IRC | 21:29 | |
timburke | clayg, hmm... the signatures on things like patternCompare in https://www3.sqlite.org/cgi/src/artifact/ed33e38cd6420581 make me pretty nervous about trying to use NUL in a LIKE... | 22:09 |
*** gyee has quit IRC | 22:14 | |
*** rcernin has joined #openstack-swift | 22:26 | |
*** patchbot has quit IRC | 22:27 | |
*** patchbot has joined #openstack-swift | 22:31 | |
*** tkajinam has joined #openstack-swift | 23:00 | |
*** joeljwright has quit IRC | 23:06 | |
*** joeljwright has joined #openstack-swift | 23:07 | |
*** ChanServ sets mode: +v joeljwright | 23:07 | |
*** hoonetorg has quit IRC | 23:12 | |
*** diablo_rojo has quit IRC | 23:17 | |
*** hoonetorg has joined #openstack-swift | 23:25 | |
*** tkajinam has quit IRC | 23:39 | |
*** tkajinam has joined #openstack-swift | 23:40 | |
mattoliverau | morning | 23:42 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!