21:00:16 #startmeeting swift 21:00:17 Meeting started Wed Dec 16 21:00:16 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:20 The meeting name has been set to 'swift' 21:00:25 who's here for the swift meeting? 21:00:33 hi 21:00:35 o/ 21:00:50 hi 21:00:52 hi o/ 21:01:01 o/ (only for a short while) 21:02:00 thank you all for coming -- i may be a little in-and-out; handling some childcare duties again 21:02:18 as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:35 first up 21:02:42 o/ 21:02:46 #topic end-of-year meeting schedule 21:03:15 how about: this is it. this is the last meeting of 2020. see you later 2020. 21:03:45 yeah, that :-) 21:03:56 Lol, damn you 2020 :p 21:04:06 Red Hat enters the Christmas shutdown until January. 21:04:20 But we could have one last meeting if we wanted. 21:04:52 nah -- next meeting as Jan 6 seems perfectly reasonable 21:05:36 next topic 21:05:41 #topic audit watchers 21:05:46 we're so close! 21:06:14 #link https://review.opendev.org/c/openstack/swift/+/706653 21:06:15 Well, the df was the last principal problem, I think. 21:06:35 Now, even if we get back to independent processees in the future, we can. 21:06:47 sorry that i haven't done another pass since my review last week 21:07:01 So, I'm honestly content with the final revision. 21:07:25 I'll take another look this week, and hopefully add my +2 again 21:07:46 i am still a bit worried about the need to distinguish start/end for different workers when you've got more than one 21:08:35 mattoliverau: I added the doc that you asked. And it includes the Dark Data part after all. At first, I hoped to sweep it under the carpet and only use it in case of emergency at customer clusters. 21:08:45 Nice 21:09:14 timburke: I thought it added their name to the logger 21:09:18 Yes, you were right. Doing what I meant is how tribal memory is generated and its' wrong. 21:09:33 Yes, logs have prefixes 21:09:54 i think the device_key/worker-id was my last major concern, and i think we could remedy that with a new arg to end (and maybe start? i'm not actually sure how important it is there...) 21:10:01 And in fact, using watcher_name is better because that comes from proxy-server.conf, and is not the name of the Python class. 21:10:57 It's easy to add new arguments thanks to Sam's foresight. I was way more concerned about letting df stuck in there. But if you want to add some, easy to do in a follow-up. 21:11:33 mattoliverau, we get watcher prefixes, but we still don't have a way to distinguish between the same watcher spread across multiple workers 21:11:57 True, but why is it needed? 21:12:52 if you've got, say, 24-bay chassis and 4 workers per node (so each worker is responsible for 6 disks), when you go to dump stats to recon (say), you don't want to have all four workers writing a quarter of the full stats to the same keys 21:12:58 Workers are ephymeral, so... even if you know PID, all you can is kill the whole auditor and maybe restart it. Thinking as an operator here, the most important is to know which object triggered issues. 21:13:15 Oh, that way. 21:13:50 i was realizing it as i was thinking through https://review.opendev.org/c/openstack/swift/+/766640 (watchers: Add a policy-stat watcher) 21:14:23 Good spot. So we need to append a worker I'd or something? 21:15:09 i think so. soemthing along the lines of the device_key from https://github.com/openstack/swift/blob/2.26.0/swift/obj/auditor.py#L98-L103 21:15:23 Right... of course you can do os.getpid() now safely, but eh... My mental model was that you just increment all stats in some center place like memcached or Prometheus, and reset them at wall clock moment, like midnight on Mondays, rather than when auditor starts. That would give you comparable counts to watch trends. 21:16:21 If you insist on recon specifically, than a key is needed. 21:16:46 But I think you can add it in a follow-up. 21:17:07 good thought on os.getpid() -- forgot about that... might be sufficient 21:17:22 os.getpid() changes when auditor restarts, so you'll have a ton of old recon files in /var 21:17:36 well, if you reboot 21:18:08 i could also re-work it so that everything's always aggregated by device, and i write things to recon based on that. saves the same problem when worker count changes 21:18:52 Hmm. We never have 2 workers crawling the same device? 21:19:14 shouldn't; not for the same audit-type, anyway 21:20:13 (seems like it'd make for more disk-thrashing) 21:21:25 Okay. I still think it's good for your and Matt's final review pass. 21:21:26 oh, i also need to think about the resumability of auditors... if they get interrupted, they pick up again more or less where they left off, right? hmm... 21:21:37 all right, i'll make sure to review it again within the next three weeks, and it sounds like mattoliverau will try to do the same 21:21:44 More or less. They write that json thing checkpoint. 21:22:27 #topic py3 fixes 21:23:16 i was noticing that we've got a few py3 fixes that i wanted to raise attention for 21:23:38 see https://wiki.openstack.org/wiki/Swift/PriorityReviews 21:24:10 timburke: I'll volunteer to review https://review.opendev.org/c/openstack/swift/+/759075 if you like 21:24:38 thanks! it could use a test, but i know i've seen https://bugs.launchpad.net/swift/+bug/1900770 while running tests in my aio 21:24:40 Launchpad bug 1900770 in OpenStack Object Storage (swift) "py3 comparison troubles" [High,In progress] 21:25:16 yup, maybe I'll put a test together, will do me good to re-educate myself about bad buckets 21:25:36 https://review.opendev.org/c/openstack/swift/+/765204 has been observed in the wild: https://bugs.launchpad.net/swift/+bug/1906289 21:25:37 Launchpad bug 1906289 in OpenStack Object Storage (swift) "Uploading a large object (SLO) in foreign language characters using S3 browser results in 400 BadRequest - Error in completing multipart upload" [High,Confirmed] 21:26:23 * mattoliverau needs to take the car in for a service. 21:26:41 Gotta run, have a great one all o/ 21:27:54 mattoliverau: later 21:28:24 and https://review.opendev.org/c/openstack/swift/+/695781 is one that i'd mostly forgotten about, but can let bad utf-8-decoded-as-latin-1-encoded-as-utf-8 out to the client 21:28:46 right... are there any more besides these 3 21:29:17 probably. those are the three i could remember ;-) 21:30:09 i *really* want to get to the point that i can feel confident in moving my prod clusters to py3 21:31:40 moving on 21:31:50 #topic finishing sharding 21:32:34 i came in late last week, so i wanted to check if there was any more discussion needed here, or if we've got a pretty good idea of what would be involved 21:33:08 I don't, but I sent David to investigate and teach me :-) 21:33:21 * zaitcev manages 21:33:49 * zaitcev shuffles some more documents 21:34:24 my summary was: 1. be able to recover from whatever could go wrong with auto-sharding (split brain) 2. do our best to prevent split-brain autosharding 3. get more confident about auto-shrinking 21:35:06 and suggested some current patches as a good starting place to get involved 21:35:17 sounds like a great plan :-) 21:35:31 e.g. the chain starting with https://review.opendev.org/c/openstack/swift/+/741721 21:35:35 i won't worry then 21:35:41 haha 21:35:51 one last-minute topic 21:35:58 #topic stable gate 21:36:07 BTW I updated priority reviews because I have squashed a couple of patches into https://review.opendev.org/c/openstack/swift/+/741721 21:36:48 currently, things are fairly broken. mostly to do with pip-on-py2 trying to drag in a version of bandit that's py3-only 21:38:00 there are some patches to pin bandit, and at least some of them are mergeable, but it looks like there are some other requirements issues going on that complicate some branches 21:38:27 didn't a bandit fix merge? 21:38:57 i'm going to keep working on getting those fixed, just wanted to keep people apprised 21:39:07 https://review.opendev.org/c/openstack/swift/+/765883 ? 21:40:17 yeah, that at least got master moving. i might be able to do that for one or two of the more-recent branches, too 21:40:38 OIC there's a bunch of backport patches 21:41:02 someone proposed a fix back on pike through stein like https://review.opendev.org/c/openstack/swift/+/766495 21:41:40 ok, is py2-constraints the right way though? 21:42:21 not all branches have a py2-constraints. though maybe we could introduce that? 21:43:01 fwiw, a cap in test-requirements.txt hits failures like https://zuul.opendev.org/t/openstack/build/284cdb5099114af685a4bfeb53b0d2ff/log/job-output.txt#520-522 on some branches 21:43:12 no ;python_version=='2.7 though for that bandit, I wonder whyt 21:44:29 another option would be to just drop bandit from test-requirements.txt on (some?) stable branches -- we don't backport *that* much, and i'm not sure how much value we get from running bandit checks on stable 21:45:12 that's all i've got 21:45:18 #topic open discussion 21:45:25 anything else we should bring up this week? 21:45:43 thanks timburke for all your work on the gate issues, it's incredibly valuable 21:46:24 anything i can do so you guys can focus on making swift great! 21:46:40 I don't understand how bandit even gets invoked. There's a [bandit] in tox.ini, but it's not in the list at the top or in any zuul jobs. 21:47:11 oh. maybe it's not no master branch. 21:48:21 its part of the pep8 tox env 21:48:22 iirc it's a flake8 plugin -- just install it and it'll start getting run as well 21:48:30 Oh, right. 21:49:23 Okay. I don't have anything else to discuss. Michelle managed to push through that patch for swift-init, but I have no idea if he's going to stick around. 21:50:50 oh yeah! looking at the bug report (https://bugs.launchpad.net/swift/+bug/1079075), i'm not actually sure that the title was really accurate... 21:50:52 Launchpad bug 1079075 in OpenStack Object Storage (swift) "swift-init should check if binary exists before starting" [Low,In progress] - Assigned to Michele Valsecchi (mvalsecc) 21:51:12 how so? He wanted not to have extra messages. 21:51:33 So, there's no change in function. 21:51:45 By "he" I mean the original reporter. 21:52:19 but the reason processes didn't start up wasn't actually missing binaries (afaict) 21:52:46 "fails because some *configuration files* are not existent" 21:53:03 well yeah 21:53:19 Someone removed both configurations and binaries 21:53:42 You know, I used to try that crap too. It was a mistake. But our RPM packages used to be very fine-grained like that. 21:54:36 *shrug* if it's still a problem, we'll get a new bug report ;-) 21:54:40 But then we started to share a bunch of code across types of services. For example, GET on accounts and containers use a function that's not in common code, but in container IIRC. So, when someone installs just one type of service, it blows up. 21:55:07 I had to give up and create a common package that contains all of the code, no matter where it belongs. 21:55:56 So the logic was, if swift-init starts checking for binaries, it would not attempt to run something that has no configuration. 21:55:57 see 21:56:37 So, I think it was an appropriate patch and it was okay for us to include it. 21:57:06 Well, its value was very low. Only helps people who do this fine-grained installation. 21:57:28 cool. yeah, i'm not worried about the patch; i do think it makes swift better. just thinking about whether the bug should be closed or not 21:57:30 Oh, Tim 21:57:41 Yeah, of course close it. 21:57:55 One question: when are we going to drop py2? 21:58:06 great question! 21:58:10 i don't know! 21:58:37 * zaitcev backrolls in nagare kaiten 21:58:46 lol 21:59:12 i feel like with train/ussuri we saw a decent number of new clusters stood up running py3-only 21:59:45 and more recently in victoria/wallaby we're seeing clusters that were on py2 migrate to py3 21:59:54 I'm sure projects other than bandit are going to put pressure on us. I think eventlet is the worst of them. 22:01:27 yup -- it's a growing worry for me too -- see https://github.com/eventlet/eventlet/pull/665 for their deprecation (i don't think they've dropped it yet, but it's just a matter of time) 22:01:31 Red Hat offers 7 years on some of the supported releases, but they have a controlled set of packages + backported patches. But in the trunk it's kind of a pain. 22:02:17 thinking mostly selfishly, i'll say "not until i've migrated off of py2 myself" ;-) 22:02:44 so, you have trunk on py2? 22:03:03 my prod clusters run py2, yes 22:03:10 What's the OS? Some kind of old Ubuntu I presume. 22:03:11 (home cluster's py3 though!) 22:03:23 centos7, mainly 22:03:39 i think we've got some legacy customers still on ubuntu 22:03:43 Right, that is py2. 22:03:53 OK thanks for the answer. 22:04:11 we package our own python; system python is a pain 22:04:32 all right, sorry, i let us go over time. thank you all for coming, and thank you for working on swift! 22:04:37 #endmeeting