Friday, 2021-07-02

opendevreview	James E. Blair proposed zuul/zuul master: Use the nodeset build parameter instead of hosts/groups https://review.opendev.org/c/zuul/zuul/+/799127	00:26
corvus	clarkb: ^ in response to your comments; tristanC ^ can you look at that and check if that will work with the runner (i suspect it may need a minor update)	00:28
opendevreview	James E. Blair proposed zuul/zuul master: WIP: Remove nodes/groups from build parameters https://review.opendev.org/c/zuul/zuul/+/799128	00:28
opendevreview	Merged zuul/zuul master: Lock/unlock nodes on executor server https://review.opendev.org/c/zuul/zuul/+/774610	00:48
opendevreview	Ian Wienand proposed zuul/nodepool master: [wip] make /sys ro in container https://review.opendev.org/c/zuul/nodepool/+/799125	00:53
opendevreview	Merged zuul/zuul master: Remove ExecutorApi.update() call from tests https://review.opendev.org/c/zuul/zuul/+/798778	03:23
opendevreview	Merged zuul/zuul master: Add some jitter to apscheduler interval cleanup jobs https://review.opendev.org/c/zuul/zuul/+/798953	03:47
opendevreview	Merged zuul/zuul master: Add comment for test method https://review.opendev.org/c/zuul/zuul/+/798954	03:47
*** jpena\|off is now known as jpena		06:52
opendevreview	Simon Westphahl proposed zuul/zuul master: Refactor pipeline processing in run handler https://review.opendev.org/c/zuul/zuul/+/795985	09:40
opendevreview	Tobias Henkel proposed zuul/nodepool master: Speed up node list https://review.opendev.org/c/zuul/nodepool/+/760542	11:18
opendevreview	Tobias Henkel proposed zuul/nodepool master: Delete init nodes when resetting lost requests https://review.opendev.org/c/zuul/nodepool/+/744107	11:21
opendevreview	Tobias Henkel proposed zuul/zuul master: Call manageLoad during pause and unpause https://review.opendev.org/c/zuul/zuul/+/755765	11:23
*** jpena is now known as jpena\|lunch		11:36
*** bhagyashris_ is now known as bhagyashris\|ruck		12:15
*** jpena\|lunch is now known as jpena		12:36
opendevreview	Tobias Henkel proposed zuul/zuul master: Optimize stats reporting per node request https://review.opendev.org/c/zuul/zuul/+/754967	12:53
opendevreview	Denis proposed zuul/zuul master: [api][cors] Access-Control-Allow-Origin * for all routes https://review.opendev.org/c/zuul/zuul/+/767691	14:09
opendevreview	Denis proposed zuul/zuul master: [api][cors] Access-Control-Allow-Origin * for all routes https://review.opendev.org/c/zuul/zuul/+/767691	14:09
corvus	i've just restarted opendev's zuul on master	14:31
tobiash[m]	yay, fingers crossed	14:33
corvus	some jobs have run to completion, so yay! i'm not quite sure what to make of the changes in zk behavior yet	14:35
opendevreview	James E. Blair proposed zuul/zuul master: Fix zuul.executors.accepting stats bug https://review.opendev.org/c/zuul/zuul/+/799219	15:07
corvus	tobiash: ^	15:07
corvus	tobiash: so we still are lacking a method of calculating the executor queue, right?	15:07
corvus	tobiash: i think that was in an old version of the executor api change, but somehow got dropped	15:11
tobiash[m]	I think so	15:11
corvus	i found it in commit 0004c2865c (an old unmerged commit of I5de26afdf6774944b35472e2054b93d12fe21793)	15:12
corvus	i'll see if i can extract it	15:12
corvus	the total is easy; the zone stuff is going to require a little extra work	15:19
corvus	so i think this will be original code instead of just extracting the old stuff	15:20
opendevreview	James E. Blair proposed zuul/zuul master: Correct executor queue stats https://review.opendev.org/c/zuul/zuul/+/799223	15:38
corvus	tobiash, fungi: ^ i think that should do it	15:38
corvus	we may have a steadily increasing number of zk watches -- we should keep an eye on that and see if it levels out or keeps increasing	15:40
clarkb	those should grow with the number of nodes right? since we add watches under each znode with contents worth watching? If we see znode count fall dramatically but watches continue to climb that could indicate a leak	15:41
corvus	yeah, nodes are holding steady at 40k while watches are increasing linearly	15:42
corvus	it's possible we're not clearing them from the executor api	15:43
corvus	but i want to give it a bit more time before i call it a pattern	15:43
clarkb	corvus: super small thing on the stats fix	15:47
*** jpena is now known as jpena\|off		15:53
opendevreview	James E. Blair proposed zuul/zuul master: Correct executor queue stats https://review.opendev.org/c/zuul/zuul/+/799223	16:09
corvus	clarkb: ++	16:09
clarkb	corvus: are we able to query zk for a list of watches via the admin side?	16:31
clarkb	corvus: if so maybe we take a listing now then wait $timeperiod and take another listing and check the delta?	16:31
clarkb	if we see watches going away we know we aren't leaking those watches but if some category does not go away then there is a good chance those do leak?	16:32
opendevreview	Merged zuul/zuul master: Fix zuul.executors.accepting stats bug https://review.opendev.org/c/zuul/zuul/+/799219	16:34
clarkb	wchs, wchc, and wchp are commands that will do that (I don't think we have any enabled via the socket but we can hit them via the rest api?). Note that wchc and wchp come with warnings that those operations can be expensive and impact the server	16:49
clarkb	corvus: I think wchp is what we want: "Lists detailed information on watches for the server, by path. This outputs a list of paths (znodes) with associated sessions. Note, depending on the number of watches this operation may be expensive (ie impact server performance), use it carefully."	16:59
clarkb	of course now we have to decide if we are willing to try it (maybe run it on zk04 which has the least number of watches?)	16:59
clarkb	corvus: running wchs (the one not annotated with be careful messages) shows the number of paths and the number of watches. The number of paths seems to be growing far less than the number of watches	17:11
clarkb	but both do seem to grow	17:12
corvus	clarkb: sorry was away; good ideas	17:24
clarkb	thats ok I've just been monitoring wchs informally and both num paths and watches go up	17:25
corvus	clarkb: i'd like to do either a wchc or wchp; which do you think? i'm thinking wchp	17:25
* clarkb rereads the docs		17:25
clarkb	corvus: ya wchp seems better since it is waches by path. I suspect that mapping will make it easier to find any potential leaks	17:26
clarkb	corvus: do you want me to go ahead and run that on zk04 or do you want to do it? It appears that it pretty prints json too though I should test that redirected	17:27
clarkb	interestingly zk05 had a large drop in watches but zk04 and zk06 have not	17:29
corvus	clarkb: why don't you do it and save the output since you have a session going	17:32
clarkb	ok	17:33
clarkb	corvus: zk04:~clarkb/wchp.20210702-first	17:34
corvus	i'm logged into zk04 (also, this is borderline opendev ops related, but i figure since we're suspecting a zuul issue, we can keep going in this channel)	17:34
clarkb	ya I figured it would be of interest to this group	17:34
corvus	as expected, many build request watches	17:34
clarkb	I can rerun it in say 5 minutes then we can look at diffs I guess	17:34
corvus	++	17:34
corvus	i'm going to look up some uuids while we wait	17:34
corvus	clarkb: oh, actually, i'd like to know who's on the other end of sessions 288312462838726666, 288312462838726667, 288312462838726659	17:35
clarkb	also it ran pretty quickly implying that this wasn't too expensive for us	17:35
corvus	do we have a way to figure that out?	17:36
clarkb	I'm sure there is a way but I don't know off the top of my head	17:36
clarkb	https://zookeeper.apache.org/doc/r3.5.9/zookeeperAdmin.html may tell us	17:36
clarkb	cons maybe	17:36
corvus	clarkb: are you just doing "nc localhost 2181"?	17:36
corvus	oh you're using the rest api?	17:37
clarkb	corvus: no I'm using the rest api because the wch* commands are not whitelisted	17:37
corvus	clarkb: what's that look like? i've never used it	17:37
corvus	thx for the pm :)	17:38
clarkb	6659 is zuul02, 6667 is ze12, and 6666 is ze05	17:40
corvus	it is interesting that all of the requests have exactly those hosts...	17:42
clarkb	there are only 6 total sessions from when I ran cons, this is I think expected because other clients will be talking to toher servers in the cluster	17:42
corvus	ah of course	17:42
clarkb	let me see what the other sessions are, they could all be mergeers maybe?	17:42
clarkb	288312462838726660 is also zuul02. Everything else is zuul mergers or nodepool builders	17:44
corvus	clarkb: adding up all the watchers for /zuul/build-requests/unzoned/ca6cd1ae56cd406eb330ce0d5b6abbc4 gives us 13 sessions	17:44
corvus	so we know every build request is getting a watch from every executor plus the scheduler	17:44
clarkb	I'm running my followup wchp now	17:45
clarkb	diffing the two shows that no build request watches went away, we only added new ones (which could be coincidence given the time some of those take to run?)	17:47
corvus	i checked the first item in the first list: ff56c3afd7cc4ac0b143c54cfdc53a72 and zuul says it finished at 15:33:00,761 which is almost exactly the time of your first query. but it's still there in the second query.	17:47
corvus	i think that's a leak	17:47
clarkb	ya seems like it	17:47
corvus	okay, i'm going to go off and stare at some code	17:47
clarkb	corvus: maybe double check that we aren't returning an exception in the logs in the watch methods which prevents the watch from being cleaned up?	17:48
corvus	(and maybe think about calling wchp from unit tests...)	17:48
clarkb	since we need to return False alone iirc	17:48
clarkb	corvus: does the request still exist too?	17:48
corvus	good q	17:49
clarkb	that might be another thing to check is if we are leaking the znode entirely? (Not sure how znode deletions interact with watches)	17:49
corvus	we do have some exceptions with the executor client trying to return nodesets, we should look at that but it's likely a separate problem	17:49
corvus	clarkb: Path /zuul/build-requests/unzoned/ff56c3afd7cc4ac0b143c54cfdc53a72 doesn't exist	17:51
corvus	so just a watch leak	17:51
corvus	(in general, you can watch paths that don't exist in zk)	17:52
clarkb	got it	17:53
clarkb	thinking out loud here if this was simply a matter of watch callbacks not being called quickly enough due to contention we would expect some of the watches to go away but more slowly than they are created. The diffs indicate this is unlikely the case as we only see new watches and none removed	17:54
corvus	++	17:55
corvus	clarkb: if there were an exception in our callback, it would prevent the watch from being removed. but i'm pretty sure it would be logged under kazoo.recipe.watchers	17:59
corvus	i believe https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/watchers.py#L155 is the relevant kazoo code	17:59
corvus	we probably should make our callback more robust, but i don't immediately see the error since i think that isn't happening.	18:00
clarkb	corvus: could it be the if not content: return code at the top of the callback?	18:02
clarkb	that would only trigger on the changed case but if CHANGED and DELETED are both triggered by deletion maybe that is the problem?	18:03
clarkb	hrm except that we should get it fired for both events in that case	18:03
clarkb	unless event.type is a mask	18:03
clarkb	https://kazoo.readthedocs.io/en/latest/api/protocol/states.html#kazoo.protocol.states.EventType doesnt say anything about masks	18:06
corvus	testing says we just get deleted on deleted	18:08
clarkb	perhaps something to do with the closure	18:13
clarkb	like maybe that has messed up kazoo's ability to count arguments and pass in event properly? (It shouldn't since we hide that from kazoo with the watch() signature	18:13
corvus	i can observe the behavior in tests	18:16
corvus	by adding a whcp call to the end of a test	18:16
clarkb	side note: a third wchp continues to show the same behavior, no build request watches are being cleaned up	18:16
corvus	i think this may be a characteristic of the datawatch kazoo recipe	18:20
corvus	i think there would have to be one more update after the delete in order for the watch to be triggered and not re-set	18:22
corvus	i think https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/watchers.py#L191-L192 causes the watcher to be re-set after the delete happens	18:23
clarkb	corvus: isn't that running before the callback on line 206?	18:26
corvus	the java api has a removeWatches call, but i don't see that in kazoo	18:26
clarkb	distinct to https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/watchers.py#L168 ?	18:27
corvus	clarkb: exactly: delete in zk happens; zk sends client a "delete" event for the watcher; Datawatch calls getdata(); getdata() calls 'get' which fails, then calls 'exists' which succeeds and sets a watch. then our callback happens, and we tell it to stop.	18:28
corvus	the system is left with a watch in place because of the exists call.	18:28
corvus	i think the only way to get a datawatch to stop watching a path is to return False before the node is deleted, and then have one more event (which could be deletion) happen. i think the 'exists' handling was added so you could watch a node that doesn't exist yet; i don't think they thought through watching a node that gets deleted.	18:30
corvus	basically, we want a datawatch which doesn't call exists	18:31
clarkb	and the exists call sets a watch that is _get_data so it is recursive	18:33
clarkb	and _get_data never seems to return False?	18:34
clarkb	corvus: can we fix this with an event type check in _get_data? basically don't call the get or exists if event is deleted? we already know it isn't existing	18:35
corvus	clarkb: yes -- but it may be an uphill battle to get that accepted; i could see the argument that they may have users that want watches to persist after deletion. so at least i think it would need to be a new optional argument to the constructor or something	18:38
opendevreview	James E. Blair proposed zuul/zuul master: WIP datawatch https://review.opendev.org/c/zuul/zuul/+/799317	18:39
clarkb	corvus: ya that seems reasonable, or even a sublcass: DeletableDataWatch	18:39
corvus	clarkb: ^ that has the behavior we want (i removed the exists check, since we're not calling it until after we know it exists)	18:39
clarkb	right we only add those watches when we have a child watch that indicates the entity exists. I suppose there is a race there if the buildrequest is immediately deleted for some reason?	18:40
clarkb	but in that case we still don't want to watch it	18:41
clarkb	we just want to move on. I think that means your fix is appropriate if you ignore private api overrides	18:41
corvus	clarkb: yeah, either approach would work for us: option (clarkb): datawatch can be created regardless of whether the znode exists, but once it's deleted, watch is removed. option (corvus): watch can only be created if node exists, watch is removed if node deleted.	18:41
corvus	yeah, maybe mine is slightly preferable in that it helps keep things tidy in the case of the create-then-delete race?	18:42
clarkb	ya. Also I think you need to set data and stat since they may not be set if an exception is raised?	18:42
clarkb	they don't get None values they will be undefined iirc	18:42
clarkb	are watches an implied ephemeral type? if the connection goes away we'll auto purge the leaked watches? (thinking about what cleanup might entail)	18:46
clarkb	corvus: we also use DataWatch() elsewhere in existing code. Any idea why this doesnt' seem to be a problem for them?	18:49
corvus	yeah, when clients disconnect, watches disappear	18:49
clarkb	NodeRequests in particualr seem to be usign DataWatches	18:49
corvus	clarkb: we stop watching node requests once they are fulfilled or canceled	18:52
corvus	but deletion is a separate step	18:52
clarkb	got it that stops the watching entirely so when the deletion event happens there is no watch to recursively keep watching with at that point	18:52
corvus	yep	18:53
corvus	there may be some edge cases there, but that's the typcial workflow. i expect it would be okay to use whatever we come up with here for that as well	18:53
clarkb	though that could potentailly race, it is unlikely to happen in bulk like this (and possibly never at all depending on timing)	18:53
clarkb	++	18:53
corvus	another approach we could do is to avoid using DataWatch and set the watches ourselves... but honestly, datawatch is a relatively simple recipe, and we'd probably just end up reimplementing most of it.	18:54
clarkb	This seems like a reasonable enough use case that we should maybe try upstreaming your child class?	18:56
clarkb	Basically a case where you only awnt to watch something as long as it exists	18:56
corvus	clarkb: re data+stat -- i don't think we need to set those? because of the return?	18:56
clarkb	oh I missed there is a return in the excpetion handler. I thought it was falling through. Don't we need to fall through and call our watcher one last time for the DELETED event?	18:57
corvus	yes	18:57
clarkb	the deletion will happen and the watcher will fire for the DELETED event. The get will fail on NoNodeError and we then need to set None types for those values and fall through and let _log_func_exception run one more time I think	18:58
clarkb	and then maybe try upstreaming that as ExistingDataWatch with some updates to the code to check self._ever_called and short circuit if that is false	18:59
clarkb	ya I think in the exception handler block you want to do what you currently have if not self._ever_called else fallthrough and let the handler run one last time	19:01
clarkb	THen you should haev a watcher that does nothing if the data does not exist when created and will stop doing things when the data is eventually deleted	19:03
clarkb	(if the watcher chooses to handle the deleted event I guess)	19:03
corvus	or we could still allow it to be called the first time, just with null data/stat/event; then the callback would know that it was deleted	19:04
clarkb	ya that would work too	19:04
clarkb	if not data and not event then you don't exist at initialization.	19:05
clarkb	and i guess letting the callback decide what to do in that case is more flexible	19:05
opendevreview	James E. Blair proposed zuul/zuul master: WIP datawatch https://review.opendev.org/c/zuul/zuul/+/799317	19:05
corvus	something like that ^	19:05
corvus	i'm going to get lunch, then i'll polish it up	19:05
clarkb	left a comment. I think it is really close	19:11
clarkb	And thinking out loud more, maybe it is better to vendor the parent class in the child (and separate them) so that we don't run into private api change problems? at least until we can get something upstreamed?	19:14
clarkb	kazoo is apache2 licensed so that shouldn't pose a problem. We jsut do some attribution and should be good?	19:15
corvus	clarkb: replied... can you take a look?	20:47
clarkb	corvus: yup lookiong now	20:54
clarkb	corvus: what about the case where you might want to keep listening after handling the DELETE event. I guess we're basically saying that isn't a thing here	20:56
clarkb	theis watcher is only good on valid nodes until they are deleted?	20:57
clarkb	corvus: your comment makes sense though. And I agree with it as long as we don't intend on letting the DELETE event handler decide if the watch should stop or not	20:59
corvus	clarkb: yes that's the intent; structurally, it's very difficult to let the callback decide that, because the datawatcher calls the zk method that would set the watch before calling the callback	21:04
corvus	by very difficult, i think i mean 'impossible within the remaining design constraints of the DataWatch recipe'	21:05
clarkb	I think you would have to use the self._ever_called and event.type attributes to decide that	21:05
clarkb	if self._ever_called and event.type == DELETED and not data then you know it has gone away	21:06
corvus	clarkb: no it's too late	21:06
corvus	maybe i'm misunderstanding the suggestion	21:06
clarkb	keeping the behavior simple like you've done makes sense to me regardless of trying to figure out the state machine	21:06
corvus	i just want to check something	21:07
corvus	you know the zookeeper watch is actually set by the "client.get" call, right?	21:07
clarkb	corvus: I think you can call retry(exists, watcher) if the callback didn't return false and self._ever_called is true and data is None	21:08
clarkb	corvus: yes both the get and the exists set it	21:08
corvus	ok, then it's me that's missing something... :)	21:08
corvus	but the callback is called after the exists call	21:08
corvus	so how do we let the callback decide whether or not to set the watch?	21:08
josefwells	Hey zuul-friends, getting close to my zuul on k8s. My executor logs show: 2021-07-02 18:54:47,152 DEBUG zuul.AnsibleJob.output: [e: e16b8800-db66-11eb-8e5e-3951beb15980] [build: c7f4c441db6b4223b861f95dd37b2deb] Ansible output: b'bwrap: Failed to make / slave: Permission denied'	21:09
clarkb	corvus: you can move that call after I think	21:09
clarkb	corvus: and then wrap it in the conditional to check if the callback wants you to put it back again	21:09
clarkb	corvus: its more trouble than it is worth	21:09
corvus	clarkb: it's before so that we pass in the stat	21:09
corvus	clarkb: yeah that would mostly work, but i think it's potentially racy	21:10
corvus	the "exists+watch" combo provides an important assurance that we don't miss any changes	21:10
josefwells	but I'm not sure where to start looking for the issue.. any debug pointers? Everything is suspect to me :)	21:10
corvus	(or get+watch)	21:10
corvus	josefwells: make sure the executor container is running privileged	21:11
josefwells	corvus: ok, thanks	21:12
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Enable ZooKeeper 4 letter words https://review.opendev.org/c/zuul/zuul-jobs/+/799334	21:24
opendevreview	James E. Blair proposed zuul/zuul master: Add ExistingDataWatch class https://review.opendev.org/c/zuul/zuul/+/799317	21:26
corvus	clarkb: ^ final version	21:26
clarkb	corvus: cool looking	21:26
corvus	well, i mean, finally ready for real review :)	21:26
corvus	i'll work on a kazoo pr meanwhile	21:27
josefwells	corvus: ok, lets just say for the sake of hypotheticals that I can't run a privileged container in my cluster.. is there some other option?	21:42
clarkb	josefwells: I think it is possible you could enable the specific items that bwrap needs to create its containers	21:43
clarkb	a privileged container is a shortcut for that. That said there is a good chance that won't work because those may be considered to be effectively the same as privileged	21:43
josefwells	bummer town, guess I just need to bite the bullet and own another server	21:44
josefwells	trying to get out of that business	21:44
josefwells	clarkb: where can I find a list of privileges I would need?	21:45
opendevreview	Merged zuul/zuul-jobs master: Enable ZooKeeper 4 letter words https://review.opendev.org/c/zuul/zuul-jobs/+/799334	21:45
josefwells	clarkb: Oh, if it is ability to create containers, yeah, seems like that would be..	21:46
clarkb	josefwells: https://github.com/containers/bubblewrap#sandboxing I think that covers the possible options. I know the the user namespaces are required for zuul's use case. Not sure about the others	21:46
clarkb	josefwells: I know that tobiash[m] has/had to run zuul on a separate openshift from their normal openshift in order to allow for privileged containers. corvus also helps manage the gerrit zuul which runs on gks. I think on gks it is a non issue though	21:52
clarkb	it is possible tobiash[m] or tristanC may have input on how to make it run more happily	21:52
opendevreview	Merged zuul/zuul master: Correct executor queue stats https://review.opendev.org/c/zuul/zuul/+/799223	21:53
josefwells	clarkb: thanks.. I've checked with my people.. I'll figure something out	21:55
clarkb	corvus: I did leave a comment on the latest patchset to make sure I understand the test watch thing properly. Otherwise I think we can go with that	22:11
corvus	clarkb: cool; want to take a quick look at https://github.com/python-zk/kazoo/pull/648 ?	22:18
corvus	(while i review your comment)	22:18
opendevreview	James E. Blair proposed zuul/zuul master: Add ExistingDataWatch class https://review.opendev.org/c/zuul/zuul/+/799317	22:22
corvus	clarkb: i did change something in that test method ^	22:22
clarkb	corvus: For the PR: small typo in the docstring for the new class: "functiol" instead of "function". The tests you added look good. The only thing I would caution with them is the event.wait(0.2) lines may be too short depending on how long the watchers take to process. But waiting for longer will make teh tests slower	22:25
corvus	oof, thx ill fix the typo	22:25
corvus	0.2 was used in other tests with similar 'expected failure' so i went with that; i agree it's concerning tho	22:26
corvus	zuul-maint: if anyone else is around to review https://review.opendev.org/799317 that would be great; otherwise, i'll probably merge it in a little while just with +2 from clarkb and me, so we can restart opendev with that before the watches get too out of hand (i'm not too worried about them, but would like to nip it in the bud before it becomes a problem :)	22:30
clarkb	I need to take a short break but I can be around for that later today	22:36
clarkb	the merging and restart and all that I mean	22:36
corvus	cool, i went aheand and +w'd it; if someone sees an issue, feel free to block it; otherwise, hopefully it'll merge within an hour or so and we can restart; i'm going to afk and get out of this chair for a bit :)	22:39
fungi	heh, was about to say i'll take a look	22:46
fungi	but there's still time if i catch a problem with it	22:46
fungi	not that i expect to	22:46
fungi	ahh, yeah this was tested on the z-j change i reviewed	22:47
clarkb	The upstream PR is maybe a bit easier to read	23:00
clarkb	you can pick out the new stuff a bit easier in the diff there, though corvus annotated the vendored code with comments	23:00
fungi	yep	23:01
clarkb	less than 11 minutes zuul says	23:32
opendevreview	Merged zuul/zuul master: Add ExistingDataWatch class https://review.opendev.org/c/zuul/zuul/+/799317	23:38
corvus	yay!	23:38
corvus	hrm, promote failed; i'll re-enqueue that	23:41
corvus	oh actually, it failed in the cleanup part; i tihnk it's okay, i'll double check dockerhub	23:41
clarkb	ok	23:42
corvus	no, i think it needs to re-run; i'll re-enqueue	23:43
clarkb	corvus: Ithink it may be old still	23:43
clarkb	ya	23:44
clarkb	I'll notify the openstack release team though it should be a noop since it is the weekend for them	23:44
corvus	something's fishy; it failed again, and there's no change_799317_latest tag	23:46
clarkb	huh did the cleanup possibly fail because the image wasn't uplaoded it he first place?	23:48
clarkb	no it seems we uploaded to docker hub	23:51
clarkb	https://zuul.opendev.org/t/zuul/build/08da96babad640a29136acdbb2244b49/log/job-output.txt#11370 is where that should have happened	23:51
corvus	it's almost like dockerhub isn't being consistent	23:51
corvus	but the "Get manifest" task shouldn't have worked without that tag	23:53
corvus	docker pull zuul/zuul:change_799317_latest works for me	23:53
corvus	so basically, the tag actually exists in the registry, but it doesn't show up in the web ui and the api can't delete it	23:54
clarkb	weird	23:55
clarkb	that would explain the failure then I guess	23:55
corvus	yeah; we could manually pull that tag on all the hosts and tag it as latest; that's not great though since it could restart into something else	23:56
clarkb	https://status.docker.com/pages/533c6539221ae15e3f000031 indicates an active issue with email	23:56
clarkb	I doubt that is related to the issue we are seeing	23:56
clarkb	corvus: any idea if we can manually update latest on dockerhub?	23:56
corvus	maybe smtp is their distributed consensus algorithm	23:56
clarkb	or maybe we already tried that	23:56
corvus	clarkb: yeah i think it should work; basically do what the job does without the 'delete' call	23:57
clarkb	ya that	23:57
corvus	we could also add a 'failed_when: false' to that delete method	23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!