*** marios is now known as marios|ruck | 05:38 | |
*** jpena|off is now known as jpena | 07:00 | |
*** rpittau|afk is now known as rpittau | 07:15 | |
*** sshnaidm_ is now known as sshnaidm | 08:20 | |
opendevreview | Simon Westphahl proposed zuul/zuul master: Don't use transactions when sharding large events https://review.opendev.org/c/zuul/zuul/+/803154 | 08:45 |
---|---|---|
opendevreview | Simon Westphahl proposed zuul/zuul master: Store build params separate from request https://review.opendev.org/c/zuul/zuul/+/803155 | 08:45 |
opendevreview | Simon Westphahl proposed zuul/zuul master: Remove transaction support from sharding API https://review.opendev.org/c/zuul/zuul/+/803156 | 08:45 |
opendevreview | Simon Westphahl proposed zuul/zuul master: Don't use transactions when sharding large events https://review.opendev.org/c/zuul/zuul/+/803154 | 08:53 |
opendevreview | Simon Westphahl proposed zuul/zuul master: Store build params separate from request https://review.opendev.org/c/zuul/zuul/+/803155 | 08:53 |
opendevreview | Simon Westphahl proposed zuul/zuul master: Remove transaction support from sharding API https://review.opendev.org/c/zuul/zuul/+/803156 | 08:53 |
opendevreview | Simon Westphahl proposed zuul/zuul master: Remove redundant get when removing a build request https://review.opendev.org/c/zuul/zuul/+/803158 | 09:03 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656 | 09:05 |
opendevreview | Simon Westphahl proposed zuul/zuul master: Vendor and use fixed Kazoo read/write lock https://review.opendev.org/c/zuul/zuul/+/803166 | 10:11 |
*** jcapitao is now known as jcapitao_lunch | 10:39 | |
opendevreview | Simon Westphahl proposed zuul/zuul master: Don't use transactions when sharding large events https://review.opendev.org/c/zuul/zuul/+/803154 | 11:17 |
opendevreview | Simon Westphahl proposed zuul/zuul master: Store build params separate from request https://review.opendev.org/c/zuul/zuul/+/803155 | 11:17 |
opendevreview | Simon Westphahl proposed zuul/zuul master: Remove transaction support from sharding API https://review.opendev.org/c/zuul/zuul/+/803156 | 11:17 |
*** jpena is now known as jpena|lunch | 11:30 | |
felixedel | mhu: I've reviewed your UI patches for autohold and authentication https://review.opendev.org/q/topic:%22fffaff%22+(status:open%20OR%20status:merged). I hope the comments are helpful. Let me know if anything is unclear or if I should provide some examples or prototype something. | 12:17 |
*** jpena|lunch is now known as jpena | 12:28 | |
opendevreview | Matthieu Huin proposed zuul/zuul master: [api][cors] Add CORS configuration https://review.opendev.org/c/zuul/zuul/+/767691 | 12:57 |
corvus | swest: if i change test_sharded_tenant_trigger_events to read data = {'test': "x" * (sharding.NODE_BYTE_SIZE_LIMIT * 2)} the test still passes, and i can see that it made 3 sequence nodes for sharding | 13:02 |
swest | corvus: we only compress the content of a znode after we split it based on the original data size | 13:02 |
corvus | swest: oh that seems backwards | 13:03 |
swest | corvus: from 797156 it sounded like we mainly wanted to reduce data size on the network and in zk | 13:06 |
swest | if we want to do compression before sharding we probably need to do that in the BufferedShard[Reader|Writer] implementation | 13:06 |
swest | but that would still lead to the same issue with the transaction when we exceed the jute.maxbuffer limit | 13:08 |
corvus | swest: yeah, the order isn't that important -- it probably mostly works out well enough with actual data | 13:10 |
corvus | swest: the test fails with the expected error if we do `data = {'test': ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(sharding.NODE_BYTE_SIZE_LIMIT * 10))}` | 13:11 |
corvus | swest: not that i doubted you -- but i did want to understand why our tests failed to catch it :{ | 13:11 |
corvus | * swest: not that i doubted you -- but i did want to understand why our tests failed to catch it :) | 13:12 |
swest | sure | 13:12 |
corvus | swest: thanks for the changes -- i'll take a look at those -- and maybe update the tests too :) i know there are some tricky sequencing problems that the transaction solved | 13:13 |
swest | corvus: yea, for the build request I had to move the params to a separate data node similar to the event data. I don't think there should be any issues as long as we create the build/event data before we create the build request or the event in the queue | 13:16 |
swest | only downside of not using a transaction is, that we need to handle the cleanup of leaked build or event data in a periodic cleanup | 13:17 |
opendevreview | Matthieu Huin proposed zuul/zuul master: CORS: support regular expressions in allowed origins https://review.opendev.org/c/zuul/zuul/+/803209 | 13:40 |
*** jcapitao_lunch is now known as jcapitao | 13:40 | |
corvus | swest: i suspect we should go ahead and vendor in your fixed lock recipe -- do you want to do that? | 13:50 |
swest | corvus: 803166 | 13:50 |
corvus | swest: great thanks :) | 13:50 |
swest | corvus: I moved the vendored watchers and lock recipe to zuul/zk/vendor | 13:51 |
swest | otherwise we'd have lock.py and locks.py in the same path which might get confusing | 13:52 |
opendevreview | Matthieu Huin proposed zuul/zuul master: Add authentication-realm attribute to tenants https://review.opendev.org/c/zuul/zuul/+/735586 | 13:55 |
opendevreview | Benjamin Schanzel proposed zuul/zuul master: Add tenant name on NodeRequests for Nodepool https://review.opendev.org/c/zuul/zuul/+/788680 | 13:59 |
corvus | felixedel: did you look at https://review.opendev.org/803117 ? | 14:04 |
felixedel | corvus: Now I did | 14:08 |
opendevreview | James E. Blair proposed zuul/zuul master: Execute merge jobs via ZooKeeper https://review.opendev.org/c/zuul/zuul/+/802299 | 14:17 |
corvus | felixedel: thanks! i squashed it. now i think we'll need to review swest's changes and restack the merger api on top of them. | 14:17 |
corvus | swest: changes lgtm so far but we should add the orphaned data cleanup first. | 14:37 |
corvus | swest: i can work on that today if you haven't started on it | 14:39 |
swest | corvus: I won't have time to do that today but can look into that tomorrow if you are busy with something else | 14:40 |
corvus | swest: i should be able to take care of it | 14:40 |
*** jpena is now known as jpena|off | 15:25 | |
*** rpittau is now known as rpittau|afk | 16:24 | |
*** sshnaidm is now known as sshnaidm|afk | 16:25 | |
*** marios|ruck is now known as marios|out | 16:26 | |
opendevreview | James E. Blair proposed zuul/zuul master: Clean up leaked event queue side channel data https://review.opendev.org/c/zuul/zuul/+/803245 | 18:20 |
opendevreview | James E. Blair proposed zuul/zuul master: Store build params separate from request https://review.opendev.org/c/zuul/zuul/+/803155 | 20:50 |
opendevreview | James E. Blair proposed zuul/zuul master: Cleanup leaked build request parameters https://review.opendev.org/c/zuul/zuul/+/803262 | 20:50 |
*** dviroel is now known as dviroel|out | 20:53 | |
opendevreview | James E. Blair proposed zuul/zuul master: Cleanup leaked build request parameters https://review.opendev.org/c/zuul/zuul/+/803262 | 21:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Move executor api cleanup to apscheduler https://review.opendev.org/c/zuul/zuul/+/803264 | 21:07 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove transaction support from sharding API https://review.opendev.org/c/zuul/zuul/+/803156 | 21:08 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove redundant get when removing a build request https://review.opendev.org/c/zuul/zuul/+/803158 | 21:08 |
opendevreview | James E. Blair proposed zuul/zuul master: Cleanup leaked build request parameters https://review.opendev.org/c/zuul/zuul/+/803262 | 21:30 |
opendevreview | James E. Blair proposed zuul/zuul master: Move executor api cleanup to apscheduler https://review.opendev.org/c/zuul/zuul/+/803264 | 21:30 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove transaction support from sharding API https://review.opendev.org/c/zuul/zuul/+/803156 | 21:30 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove redundant get when removing a build request https://review.opendev.org/c/zuul/zuul/+/803158 | 21:30 |
corvus | clarkb, tobiash, swest: i've added to and restacked swest's changes from earlier. the gist is that my "transaction" fix to the payload size issues observed with very large repos was flawed, and so was the test of it. the new stack fixes the issue and the test. would you please review this with a relatively high priority (since it addresses issues with current master/release)? https://review.opendev.org/q/status:open+hashtag:sos-fix | 21:47 |
clarkb | corvus: yes I can start to look at that once I've gotten the opendev meeting agenda out. | 21:47 |
corvus | (i cheated and also added the kazoo lock vendor/fix in there, mostly because i'm worried about divergence if we add new read/write locks) | 21:48 |
corvus | clarkb: thanks! | 21:48 |
opendevreview | lotorev vitaly proposed zuul/zuul master: doc: Specify job.required-projects is overriden https://review.opendev.org/c/zuul/zuul/+/803269 | 22:07 |
clarkb | corvus: in https://review.opendev.org/c/zuul/zuul/+/803154 what ensures we don't read a half written set of data? is this a keep reviewing the whole stack situation? | 22:17 |
corvus | clarkb: no that should be in place there -- basically it's the "write the sharded data first, then write the event last" that makes that work | 22:22 |
clarkb | I guess we write the actual event after the side channel | 22:22 |
clarkb | yup | 22:22 |
corvus | oh shoot -- is that going to race with the cleanup i wrote in the next patch? | 22:23 |
clarkb | possibly | 22:23 |
clarkb | I'm only just now reading that change so not sure yet :) | 22:23 |
clarkb | corvus: do you need to add a sequence id to the shard data entries? then if the last queue entry sequence id is > the shard data we know we can clean the shard data? | 22:26 |
clarkb | though that may not be true unless you also prevent writes to the queue in parallel | 22:26 |
corvus | i think parallel writes are expected | 22:28 |
corvus | so we could have A start, B start, B finish, A finish | 22:28 |
corvus | we also don't know the sequence id when we write the shard data | 22:29 |
corvus | i think we may have to resort to timestamps? | 22:29 |
corvus | that's so hard to test though :/ | 22:30 |
clarkb | ya if timestamp of shard data is at least 5 minutes old and doesn't have a corresponding event it is highly likely to have leaked? | 22:30 |
clarkb | a delta that large between writing shard data and the event itself should basically never happen | 22:31 |
corvus | yep | 22:31 |
clarkb | fwiw I think this is something that kafka struggled with | 22:31 |
clarkb | they basically had to keep all the real data outside of zookeeper then index into it via zk? I want to say this is one reason they ended up just absorbing the pacsos type db into their own software | 22:32 |
corvus | yeah, it's a problem :) | 22:34 |
clarkb | corvus: reading the cleanup code it seems that it only cleans up shard data for events that are in the queue? | 22:35 |
clarkb | which may mean we are not susceptible to problems as long as we lock around queue processing (I think we do) | 22:36 |
clarkb | hwoever, it does mean that if we clean up an event but not its shard data we'd still leak that shard data? | 22:36 |
clarkb | corvus: also left another thing on https://review.opendev.org/c/zuul/zuul/+/803245 which may end up related to ^ | 22:38 |
opendevreview | James E. Blair proposed zuul/zuul master: Clean up leaked event queue side channel data https://review.opendev.org/c/zuul/zuul/+/803245 | 22:40 |
corvus | clarkb: re the first thing -- it should only clean up shard data that isn't referenced by an event | 22:41 |
corvus | ie: make candidate list of shard data; remove any shard data from that list if it's referenced by a current event | 22:42 |
clarkb | corvus: oh I see the remove call there is against the set | 22:43 |
clarkb | I totally read that as remove from zk | 22:43 |
clarkb | ok ya | 22:43 |
corvus | cool -- the next ps replaced that double event list with the timestamp check | 22:43 |
corvus | i'll fixup the comments too | 22:43 |
clarkb | yup looking at the new ps now | 22:43 |
clarkb | ya I think this new patchset will work. It builds a candidate list based on age. If we still see the event for the a candidate it gets removed from the list. Then we return the list of candidates that have no corresponding event | 22:45 |
opendevreview | James E. Blair proposed zuul/zuul master: Clean up leaked event queue side channel data https://review.opendev.org/c/zuul/zuul/+/803245 | 22:45 |
corvus | clarkb: bonus: filtering by age will reduce the candidate set to 0 in most cases | 22:45 |
corvus | that ps is just the comment update | 22:46 |
clarkb | corvus: should you add that check in _findLostSideChannelData() ? if data_nodes: #get events and iterate over events | 22:47 |
clarkb | corvus: that way we reduce the number of zk queries necessary to scan for leaks | 22:47 |
corvus | clarkb: oh you mean short circuit if candidate list is empty? | 22:49 |
clarkb | yes | 22:50 |
clarkb | left a comment with more info on that | 22:50 |
opendevreview | James E. Blair proposed zuul/zuul master: Clean up leaked event queue side channel data https://review.opendev.org/c/zuul/zuul/+/803245 | 22:51 |
corvus | done | 22:51 |
clarkb | lgtm | 22:52 |
clarkb | corvus: are you going to rebase the children of ^ if so I'll wait for that before I start reviewing | 22:52 |
corvus | clarkb: yeah, gimme 5 mins to finish up the same update for the next pair of changes | 22:53 |
clarkb | ok | 22:53 |
opendevreview | James E. Blair proposed zuul/zuul master: Store build params separate from request https://review.opendev.org/c/zuul/zuul/+/803155 | 22:57 |
opendevreview | James E. Blair proposed zuul/zuul master: Cleanup leaked build request parameters https://review.opendev.org/c/zuul/zuul/+/803262 | 22:57 |
corvus | clarkb: okay that's the next pair | 22:57 |
corvus | ftr, i really would like to DRY this, but every one of these has just enough subtle differences in paths or behavior that make that impractical | 22:58 |
opendevreview | James E. Blair proposed zuul/zuul master: Move executor api cleanup to apscheduler https://review.opendev.org/c/zuul/zuul/+/803264 | 22:59 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove transaction support from sharding API https://review.opendev.org/c/zuul/zuul/+/803156 | 22:59 |
opendevreview | James E. Blair proposed zuul/zuul master: Remove redundant get when removing a build request https://review.opendev.org/c/zuul/zuul/+/803158 | 22:59 |
clarkb | corvus: why use sets and iterators and dicts in the cleanup and find paths differently | 23:09 |
clarkb | looking at the code I think https://review.opendev.org/c/zuul/zuul/+/803262/4/zuul/zk/executor.py _findLostParams might be able to use a set like _findLostSideChannelData() ? might be nice to be consistent there | 23:09 |
clarkb | anyway I'll leave proper review comments | 23:10 |
corvus | clarkb: because we deduplicate by build uuid, but we return paths | 23:10 |
clarkb | oh I see | 23:10 |
corvus | we need a path at step 1 to get zstat, then an id at step 2 to de-duplicate, then a path at step 3 to return to the caller. i figure since we need to keep both bits of data past step 2, a dict made sense to pair them | 23:11 |
clarkb | ya and unlike the side channel data the ids are meaningful here | 23:12 |
corvus | exactyl. and that's one of those subtle differences that means this can't be a straight copy/reuse :/ | 23:12 |
clarkb | small note on https://review.opendev.org/c/zuul/zuul/+/803264 as a sanity check | 23:19 |
clarkb | the stack lgtm now | 23:21 |
corvus | clarkb: yep, good to be aware of that | 23:22 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!