Sunday, 2022-04-10

*** dviroel_ is now known as dviroel\|out		00:16
opendevreview	Jeremy Stanley proposed opendev/git-review master: Force use of scp rather than sftp when possible https://review.opendev.org/c/opendev/git-review/+/823413	12:39
opendevreview	Jeremy Stanley proposed opendev/git-review master: Fix submitting signed patches https://review.opendev.org/c/opendev/git-review/+/823318	13:35
opendevreview	Jeremy Stanley proposed opendev/git-review master: Drop support for Python 3.5 https://review.opendev.org/c/opendev/git-review/+/837222	13:35
opendevreview	Jeremy Stanley proposed opendev/git-review master: Clean up package metadata https://review.opendev.org/c/opendev/git-review/+/837228	15:57
mnaser	is there some sort of 'afs cache' ?	16:25
mnaser	ok, literally right as i ask that, the file i needed appeared into afs :)	16:26
fungi	if you're asking about delays, it depends on what file/url you're looking at as to what the update process is	16:26
fungi	documentation site? package mirrors? release artifacts? something else?	16:26
mnaser	https://tarballs.opendev.org/vexxhost/ansible-collection-atmosphere/	16:27
mnaser	the generated wheels there took around 4 minutes to show up after the promote job was done	16:27
fungi	mnaser: yeah, so what happens is the publish job records the artifacts into the read-write afs volume for that site, and then a cronjob periodically (every ~5 minutes) runs through all the afs volumes for the static.o.o sites (including tarballs) and performs a vos release to sync them to the read-only replica which backs those sites	16:34
fungi	usually you should see it appear within 5 minutes, but if there are particularly large content updates for any one of those volumes it can delay things since they're updated serially in order to avoid saturating the connection	16:35
mnaser	aaah, got it, that makes sense now, thanks fungi !	16:39
fungi	my pleasure	16:41
opendevreview	Mohammed Naser proposed opendev/system-config master: docker: add arm64 mirroring https://review.opendev.org/c/opendev/system-config/+/837232	18:10
mnaser	supporting arm64: https://www.youtube.com/watch?v=AbSehcT19u0	18:10
hrw	mnaser: commented.	18:52
mnaser	I’ve been seeing this weird experience where jobs with an explicit nodeset of “Ubuntu-focal” take longer to start than one that doesn’t have a node set at all (which defaults to focal..)	20:47
fungi	mnaser: that does indeed seem weird. both should be served the same under the hood since the jobs without an explicit nodeset actually have an explicit nodeset through inheritance: https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/jobs.yaml#L65	21:08
fungi	it's not any less explicit, just inherited from a parent job. the result would be identical in either case though... zuul putting out a node request for that same nodeset definition	21:09
mnaser	fungi: yeah.. maybe it’s something else that is causing the delay…	21:12
mnaser	In a specific case, it took 8 minutes for a job to start	21:13
mnaser	https://zuul.opendev.org/t/vexxhost/builds .. ansible-collection-atmosphere-build-images-wallaby-amd64 finished at 20:44 and ansible-collection-atmosphere-build-images-manifest-wallaby started at 20:50	21:14
mnaser	So almost 6 minutes waiting and things are pretty idle right now	21:14
fungi	node launch failures can cause significant delays since the launcher will lock the request while it waits for a node to boot (if none of them have one waiting via min-ready)	21:14
fungi	though i think the launcher waits for up to 10 minutes for the nodes to become reachable, so if the delay was less than that it could just be some providers taking longer than usual to boot	21:15
fungi	any correlation between start delays and the providers mentioned in the zuul inventory?	21:16
fungi	we also have some graphs of boot times, i think. i'll look	21:16
mnaser	fungi: there might have been some failures in providers since I saw some jobs retrying too but didn’t dig too deep as to why they did	21:20
mnaser	But yeah in general focal nodes in the VEXXHOST tenant seem to take a little bit longer to come by. Actually, I find that we get an arm64 node WAY faster, even if the other tenants are relatively idle	21:20
mnaser	In this case the amd64 job started a whole 4 minutes after	21:21
fungi	https://grafana.opendev.org/d/6c807ed8fd/nodepool?orgId=1&viewPanel=18	21:22
mnaser	I wonder if there just isn’t enough min-ready and my wait time is say… waiting for rax	21:23
fungi	looks like ovh nodes were taking a while at times	21:23
fungi	yeah, i mean we don't run many arm64 jobs so the min-ready there might be covering you and explain the faster starts on a fairly quiet sunday	21:23
mnaser	yeah I think that might add up to the reasoning why	21:24
fungi	given the volume of jobs we run most of the time, we optimize for throughput and resource conservation over immediacy of results	21:27
fungi	to make any impact on responsiveness at higher-volume times, we'd need to carry a very large min-ready for some of our labels	21:28
fungi	which would then result in a lot of nodes sitting booted but idle at times like now	21:28
fungi	also no amount of min-ready would make any difference in response times when we're running at full capacity, of course	21:30
fungi	and could even result in a slight reduction in effective capacity if we end up aggressively booting labels which aren't in as high demand at those times	21:31
mnaser	yeah, scheduling is a tricky thing since there is so many varying types of systems	21:38
*** rlandy is now known as rlandy\|out		21:39
opendevreview	Merged openstack/project-config master: opendev/gerrit : retire project https://review.opendev.org/c/openstack/project-config/+/833939	23:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!