*** mugsie has quit IRC | 01:00 | |
*** mugsie has joined #openstack-release | 01:04 | |
*** dave-mccowan has quit IRC | 01:20 | |
*** armax has quit IRC | 01:25 | |
*** armstrong has quit IRC | 03:55 | |
*** ykarel has joined #openstack-release | 04:31 | |
*** ykarel has quit IRC | 04:35 | |
*** ykarel has joined #openstack-release | 04:36 | |
*** ykarel has quit IRC | 05:10 | |
*** ykarel has joined #openstack-release | 05:16 | |
*** ykarel_ has joined #openstack-release | 05:18 | |
*** ykarel has quit IRC | 05:21 | |
*** vishalmanchanda has joined #openstack-release | 05:29 | |
*** armax has joined #openstack-release | 05:32 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #openstack-release | 05:33 | |
*** e0ne has joined #openstack-release | 06:22 | |
*** e0ne has quit IRC | 06:24 | |
*** e0ne has joined #openstack-release | 07:06 | |
*** sboyron has joined #openstack-release | 07:08 | |
*** dtantsur|afk is now known as dtantsur | 07:20 | |
*** rpittau|afk is now known as rpittau | 07:42 | |
*** e0ne has quit IRC | 07:45 | |
*** ykarel_ is now known as ykarel | 07:56 | |
*** slaweq has joined #openstack-release | 08:11 | |
*** ykarel has quit IRC | 08:13 | |
*** e0ne has joined #openstack-release | 08:14 | |
*** ykarel has joined #openstack-release | 08:16 | |
*** hberaud has joined #openstack-release | 08:19 | |
*** jbadiapa has quit IRC | 08:28 | |
*** tosky has joined #openstack-release | 08:36 | |
*** jbadiapa has joined #openstack-release | 08:38 | |
*** e0ne has quit IRC | 08:54 | |
openstackgerrit | Hervé Beraud proposed openstack/releases master: ignore trailing projects in R-2 https://review.opendev.org/755861 | 09:40 |
---|---|---|
openstackgerrit | Hervé Beraud proposed openstack/releases master: ignore trailing projects in R-2 https://review.opendev.org/755861 | 09:44 |
*** ykarel_ has joined #openstack-release | 10:06 | |
*** ykarel has quit IRC | 10:09 | |
*** hberaud has quit IRC | 10:20 | |
*** hberaud has joined #openstack-release | 10:21 | |
openstackgerrit | Hervé Beraud proposed openstack/releases master: Adding a tool to track project who need to drop eol branches https://review.opendev.org/758990 | 10:46 |
*** ykarel_ is now known as ykarel | 11:17 | |
*** e0ne has joined #openstack-release | 11:37 | |
*** sboyron has quit IRC | 11:45 | |
hberaud | smcginnis: I wonder why your changes (https://review.opendev.org/#/c/759309/) are not present => https://releases.openstack.org/reference/reviewer_guide.html | 12:08 |
hberaud | I didn't see job failure | 12:08 |
hberaud | logs seems ok => https://zuul.opendev.org/t/openstack/build/caffc17288b84f4ab6a048ab2c7f5614/log/job-output.txt | 12:17 |
smcginnis | hberaud: That's odd. I wonder if there's an AFS sync issue or something like that again. | 12:29 |
smcginnis | fungi: Do you have a moment to check on that? ^ | 12:29 |
hberaud | I didn't see weird things in logs | 12:29 |
fungi | lookinf | 12:30 |
fungi | looking i mean | 12:30 |
*** dtantsur is now known as dtantsur|brb | 12:34 | |
fungi | the bottom of https://releases.openstack.org/reference/reviewer_guide.html says it was last "updated: Mon Oct 26 13:04:28 2020, commit bcc0fbced" | 12:37 |
fungi | https://zuul.opendev.org/t/openstack/builds/?job_name=publish-tox-docs-releases shows that was the last commit to build | 12:38 |
fungi | and that's not the most recent merge commit | 12:40 |
fungi | b5cc5e9 is the current branch tip, not bcc0fbc | 12:41 |
fungi | oh, my terminal was wrapping something around, 0c59b5c is the current branch tip | 13:01 |
fungi | okay, so that was the problem | 13:01 |
fungi | hberaud: smcginnis: the builds completed out of order. the publish-tox-docs-releases build for bcc0fbc finished after the build for 0c59b5c | 13:03 |
fungi | they started out of order, which usually indicates a restart for the job, first one may have lost its node somehow | 13:06 |
*** dave-mccowan has joined #openstack-release | 13:07 | |
hberaud | fungi: I see thanks | 13:16 |
hberaud | so normally during the next build the doc will be fixed | 13:17 |
fungi | yes | 13:18 |
fungi | we just happened to publish the second-most-recent state immediately after the most-recent state | 13:18 |
fungi | there are ways to avoid that, like using a semaphore in the job or running that job in a supercedent pipeline | 13:19 |
hberaud | fungi: AFAIK we already have a semaphore for this job https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L413 | 13:25 |
hberaud | https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L432 | 13:25 |
hberaud | as we already faced race condition with access during doc deployment with rsync | 13:26 |
*** brinzhang_ has quit IRC | 13:37 | |
hberaud | our current semaphore ensure us that our resource (the doc server) is only sync by one job at once, but it doesn't protect us for this use case (when the builds completed out of order), resource is available, so the upload can be launched. (if I correctly understood our use case) | 13:43 |
*** dave-mccowan has quit IRC | 13:46 | |
*** dave-mccowan has joined #openstack-release | 13:50 | |
smcginnis | Thanks fungi. That makes sense. | 14:02 |
fungi | oh, good point, that prevents two builds from running concurrently, but doesn't necessarily prevent them from running out of order | 14:02 |
smcginnis | We should have another build sometime soon, so it all sorts out in the end. | 14:02 |
fungi | a supercedent pipeline would preserve ordering | 14:02 |
*** dtantsur|brb is now known as dtantsur | 14:03 | |
fungi | (as would a dependent pipeline) | 14:03 |
*** slaweq is now known as slaweq|ptg | 14:07 | |
hberaud | fungi: I don't expect that this kind of scenario will happen at each build but if supercedent can help us to definitely fix similar bugs why not | 14:19 |
fungi | there are tradeoffs | 14:21 |
fungi | release-post is set to independent (unlike post which is supercedent) because you're also running some jobs in there that need to run for every enqueued ref, though the docs job isn't one of them | 14:22 |
fungi | it could be moved to an existing supercedent pipeline like post or promote, but those pipelines don't have as high of a priority as the release and release-post pipelines so jobs in them can take longer to run when we're short on resources | 14:23 |
hberaud | I see | 14:26 |
fungi | or we could create a release-specific supercedent pipeline, but seems like it would be overkill just for that one job | 14:26 |
fungi | or it's possible tag-releases no longer needs to run for every ref, and would be fine just running the most recently enqueued ref, but i'm not deeply familiar with how that job decides what should be tagged | 14:27 |
hberaud | most of our doc changes are not really high priority changes except the last changes of a cycle when we update cycle status | 14:28 |
fungi | digging deeper in the scheduler's debug log, it doesn't appear there was any retry for the earlier build, they simply started out of order. the only other cause i can think of is if the node request for that build of the earlier ref failed and had to be resubmitted, and since the later ref got the node request for its build satisfied sooner it ran first | 14:35 |
hberaud | I don't think we want to move the release-post to supercedent, I personally prefer to move this doc job to an existing supercedent pipeline, it looks more safe for us | 14:38 |
hberaud | also AFAIK I didn't see similar issue in the past and now we are aware of that, so maybe we could consider to leave our existing config as it is for now and move on another pipeline only if we face another similar issue, it will avoid the tradeoffs for now, thoughts? | 14:44 |
fungi | yeah, i'm still trying to nail down the precise circumstances which allowed this to happen, but i'm getting the feeling it would be a very rare race condition | 14:46 |
hberaud | yep seem things here | 14:47 |
hberaud | I don't want to sacrifice our pipeline priority for an hypothetical and unlikely use case | 14:48 |
hberaud | fungi: anyway thanks for your great help | 14:49 |
fungi | my pleasure, i'm still trying to get to the bottom of this because i want to understand it better in case it comes up again | 14:51 |
hberaud | fungi: sure, don't hesitate to share your observations I'm interested by this topic | 14:54 |
fungi | it gets stranger. so i've confirmed that the ref-updated events from those changes merging did arrive in order, but the scheduler initiated the node requests for them out of sequence... and nearly an hour later | 14:56 |
hberaud | I see | 14:57 |
fungi | i wonder if the mergers were overloaded, and eventually returned the build refs out of sequence | 14:57 |
clarkb | node requests can be fulfilled out of order | 15:10 |
clarkb | the reason for this is a specific provider grabs each request and they may fulfill them in different orders. If we need to sequence them that has to happen prior to sibmitting the node requests | 15:11 |
fungi | in this case the scheduler submitted the node requests out of order, though it looks like the mergers returned refs in order | 15:17 |
fungi | but there's nearly an hour between when the mergers return and when the node requests are submitted, so i'm still trying to work out what happened there | 15:18 |
fungi | clarkb: do you happen to know if semaphores block nodes from being requested? i guess that would make sense. seems like maybe there was a series of merges for openstack/releases so this may have been the end of a long chain of serialized builds. maybe the ordering for submitting node requests gets racy | 15:19 |
clarkb | I think semaphores should order the node request submissions | 15:20 |
fungi | i don't see any indication that the node requests failed and were resubmitted | 15:39 |
fungi | this is a heavily filtered timeline: http://paste.openstack.org/show/799436 | 15:40 |
*** ykarel has quit IRC | 15:40 | |
fungi | you can see the merge requests happened and returned immediately after the triggering events | 15:40 |
fungi | but the node requests were added out of order | 15:41 |
fungi | but also the only explanation i have for the ~45-minute gap between when the mergers returned refs and when the node requests were submitted is that there were other builds queued/underway with the same semaphore | 15:42 |
*** sboyron has joined #openstack-release | 15:46 | |
*** e0ne has quit IRC | 15:59 | |
*** tosky_ has joined #openstack-release | 16:07 | |
*** tosky has quit IRC | 16:08 | |
*** slaweq|ptg is now known as slaweq | 16:08 | |
clarkb | if the node requests fail the job should have a node failure iirc | 16:22 |
clarkb | they don't get resubmitted | 16:22 |
fungi | yeah, so that definitely wasn't it | 16:36 |
fungi | i don't have a good explanation for why the scheduler seems to have submitted these node requests out of sequence | 16:36 |
fungi | corvus: no idea if you're around, but maybe you have a suggestion of what to look at next | 16:36 |
clarkb | corvus may have thoughts ? | 16:36 |
*** tosky_ is now known as tosky | 16:56 | |
*** ricolin has quit IRC | 17:10 | |
*** vishalmanchanda has quit IRC | 17:30 | |
*** rpittau is now known as rpittau|afk | 17:34 | |
*** dtantsur is now known as dtantsur|afk | 18:13 | |
*** hberaud has quit IRC | 18:25 | |
*** hberaud has joined #openstack-release | 19:54 | |
*** hberaud has quit IRC | 20:05 | |
*** melwitt has joined #openstack-release | 20:07 | |
*** slaweq has quit IRC | 20:16 | |
*** rpittau|afk is now known as rpittau | 20:18 | |
*** slaweq has joined #openstack-release | 20:19 | |
*** hberaud has joined #openstack-release | 20:23 | |
*** otherwiseguy_ is now known as otherwiseguy | 20:33 | |
*** gouthamr has quit IRC | 20:58 | |
*** gouthamr has joined #openstack-release | 20:58 | |
*** gouthamr has quit IRC | 20:59 | |
*** gouthamr has joined #openstack-release | 20:59 | |
*** rpittau is now known as rpittau|afk | 21:03 | |
*** slaweq has quit IRC | 21:04 | |
*** slaweq has joined #openstack-release | 21:22 | |
*** hberaud has quit IRC | 21:28 | |
*** sboyron has quit IRC | 22:06 | |
*** slaweq has quit IRC | 23:10 | |
*** tosky has quit IRC | 23:52 | |
*** armax has quit IRC | 23:55 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!