*** dviroel_ is now known as dviroel|out | 00:16 | |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Force use of scp rather than sftp when possible https://review.opendev.org/c/opendev/git-review/+/823413 | 12:39 |
---|---|---|
opendevreview | Jeremy Stanley proposed opendev/git-review master: Fix submitting signed patches https://review.opendev.org/c/opendev/git-review/+/823318 | 13:35 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Drop support for Python 3.5 https://review.opendev.org/c/opendev/git-review/+/837222 | 13:35 |
opendevreview | Jeremy Stanley proposed opendev/git-review master: Clean up package metadata https://review.opendev.org/c/opendev/git-review/+/837228 | 15:57 |
mnaser | is there some sort of 'afs cache' ? | 16:25 |
mnaser | ok, literally right as i ask that, the file i needed appeared into afs :) | 16:26 |
fungi | if you're asking about delays, it depends on what file/url you're looking at as to what the update process is | 16:26 |
fungi | documentation site? package mirrors? release artifacts? something else? | 16:26 |
mnaser | https://tarballs.opendev.org/vexxhost/ansible-collection-atmosphere/ | 16:27 |
mnaser | the generated wheels there took around 4 minutes to show up after the promote job was done | 16:27 |
fungi | mnaser: yeah, so what happens is the publish job records the artifacts into the read-write afs volume for that site, and then a cronjob periodically (every ~5 minutes) runs through all the afs volumes for the static.o.o sites (including tarballs) and performs a vos release to sync them to the read-only replica which backs those sites | 16:34 |
fungi | usually you should see it appear within 5 minutes, but if there are particularly large content updates for any one of those volumes it can delay things since they're updated serially in order to avoid saturating the connection | 16:35 |
mnaser | aaah, got it, that makes sense now, thanks fungi ! | 16:39 |
fungi | my pleasure | 16:41 |
opendevreview | Mohammed Naser proposed opendev/system-config master: docker: add arm64 mirroring https://review.opendev.org/c/opendev/system-config/+/837232 | 18:10 |
mnaser | supporting arm64: https://www.youtube.com/watch?v=AbSehcT19u0 | 18:10 |
hrw | mnaser: commented. | 18:52 |
mnaser | I’ve been seeing this weird experience where jobs with an explicit nodeset of “Ubuntu-focal” take longer to start than one that doesn’t have a node set at all (which defaults to focal..) | 20:47 |
fungi | mnaser: that does indeed seem weird. both should be served the same under the hood since the jobs without an explicit nodeset actually have an explicit nodeset through inheritance: https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/jobs.yaml#L65 | 21:08 |
fungi | it's not any less explicit, just inherited from a parent job. the result would be identical in either case though... zuul putting out a node request for that same nodeset definition | 21:09 |
mnaser | fungi: yeah.. maybe it’s something else that is causing the delay… | 21:12 |
mnaser | In a specific case, it took 8 minutes for a job to start | 21:13 |
mnaser | https://zuul.opendev.org/t/vexxhost/builds .. ansible-collection-atmosphere-build-images-wallaby-amd64 finished at 20:44 and ansible-collection-atmosphere-build-images-manifest-wallaby started at 20:50 | 21:14 |
mnaser | So almost 6 minutes waiting and things are pretty idle right now | 21:14 |
fungi | node launch failures can cause significant delays since the launcher will lock the request while it waits for a node to boot (if none of them have one waiting via min-ready) | 21:14 |
fungi | though i think the launcher waits for up to 10 minutes for the nodes to become reachable, so if the delay was less than that it could just be some providers taking longer than usual to boot | 21:15 |
fungi | any correlation between start delays and the providers mentioned in the zuul inventory? | 21:16 |
fungi | we also have some graphs of boot times, i think. i'll look | 21:16 |
mnaser | fungi: there might have been some failures in providers since I saw some jobs retrying too but didn’t dig too deep as to why they did | 21:20 |
mnaser | But yeah in general focal nodes in the VEXXHOST tenant seem to take a little bit longer to come by. Actually, I find that we get an arm64 node WAY faster, even if the other tenants are relatively idle | 21:20 |
mnaser | In this case the amd64 job started a whole 4 minutes after | 21:21 |
fungi | https://grafana.opendev.org/d/6c807ed8fd/nodepool?orgId=1&viewPanel=18 | 21:22 |
mnaser | I wonder if there just isn’t enough min-ready and my wait time is say… waiting for rax | 21:23 |
fungi | looks like ovh nodes were taking a while at times | 21:23 |
fungi | yeah, i mean we don't run many arm64 jobs so the min-ready there might be covering you and explain the faster starts on a fairly quiet sunday | 21:23 |
mnaser | yeah I think that might add up to the reasoning why | 21:24 |
fungi | given the volume of jobs we run most of the time, we optimize for throughput and resource conservation over immediacy of results | 21:27 |
fungi | to make any impact on responsiveness at higher-volume times, we'd need to carry a very large min-ready for some of our labels | 21:28 |
fungi | which would then result in a lot of nodes sitting booted but idle at times like now | 21:28 |
fungi | also no amount of min-ready would make any difference in response times when we're running at full capacity, of course | 21:30 |
fungi | and could even result in a slight reduction in effective capacity if we end up aggressively booting labels which aren't in as high demand at those times | 21:31 |
mnaser | yeah, scheduling is a tricky thing since there is so many varying types of systems | 21:38 |
*** rlandy is now known as rlandy|out | 21:39 | |
opendevreview | Merged openstack/project-config master: opendev/gerrit : retire project https://review.opendev.org/c/openstack/project-config/+/833939 | 23:49 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!