Monday, 2021-08-09

*** abhishekk is now known as akekane\|home		05:21
*** akekane\|home is now known as abhishekk		05:22
*** ykarel_ is now known as ykarel		05:43
*** jpena\|off is now known as jpena		07:35
*** rpittau\|afk is now known as rpittau		07:52
*** akekane_ is now known as abhishekk		08:26
*** ykarel is now known as ykarel\|lunch		08:32
*** ykarel\|lunch is now known as ykarel		10:06
*** jcapitao is now known as jcapitao_lunch		10:41
*** rlandy is now known as rlandy\|ruck		11:08
yoctozepto	I am wondering why Masakari renos do not show up https://docs.openstack.org/releasenotes/masakari-dashboard/unreleased.html the promote job has run successfully but its effects cannot be seen	11:35
*** jpena is now known as jpena\|lunch		11:39
fungi	yoctozepto: do you see them being built in the docs job?	11:40
fungi	er, nevermind. openstack projects seem to use a separate releasenotes job	11:41
*** hberaud_ is now known as hberaud		11:44
fungi	ahh, the releasenotes are built as part of the docs job but then the promote-openstack-releasenotes job pulls the release notes from the created docs tarball artifact and publishes those separately	11:47
fungi	for example, https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_2e4/798842/3/gate/build-openstack-releasenotes/2e4842e/docs-html.tar.gz is the one https://zuul.opendev.org/t/openstack/build/7d5734d34dff4f669f0b5584aff2e15e tried to publish them from	11:48
fungi	and i see a release note in the unreleased.html that includes, which is not appearing on the docs site	11:51
fungi	i can see the new note and a build timestamp from saturday in /afs/.openstack.org/docs/releasenotes/masakari-dashboard/unreleased.html (the read-write volume path)	11:56
fungi	but the docs site publishes from a read-only replica at /afs/openstack.org/docs/releasenotes/masakari-dashboard/unreleased.html which lacks that note and has a build timestamp from months ago (2021-05-24)	11:57
fungi	so looks like that volume may have stale content, vos release could be broken, i'll look into that possibility	11:58
fungi	ianw: if you're still around, where did the vos release cronjob for static site volumes move to?	12:04
yoctozepto	thanks fungi for handling the issue	12:05
fungi	oh, right, we moved it from cron to ansible, so i guess bridge initiates and logs it now	12:06
fungi	looks like vos release has been failing for the docs volume for at least a week (that's as far back as our log retention goes)	12:10
fungi	i suppose it's possible we rebooted afs01.dfw in the middle of a vos release of the docs volume when we were doing kernel updates a few weeks back	12:14
fungi	there's a tarballs volume update in progress since a few minutes ago, but once it completes i'll take the lock for that cronjob in a root screen session on mirror-update and start trying to manually release the docs volume	12:19
fungi	yeah, vos release says the vldb entry is already locked. i'll try to clear it	12:27
fungi	unlocked it, vos release is running with -localauth under a root screen session on afs01.dfw now	12:28
*** jpena\|lunch is now known as jpena		12:32
fungi	it hasn't said anything to the tty yet though, leading me to wonder if there's an existing transaction it's waiting to see complete	12:48
fungi	if this takes much longer i'll propose a patch to switch to serving sites from the read-write volume until we can get the read-only replicas back in sync	12:50
yoctozepto	thanks	13:04
fungi	yoctozepto: it finished. check that https://docs.openstack.org/releasenotes/masakari-dashboard/unreleased.html now shows what you expect when refreshed	13:13
fungi	#status log Deleted stale vldb entry for AFS docs volume and ran vos release manually to catch up the read-only replicas	13:13
opendevstatus	fungi: finished logging	13:13
yoctozepto	fungi: thanks! I assume I should be expecting it now to always stay in sync? or does not it run on promote, only on some schedule?	14:23
fungi	yoctozepto: the promote job writes to the read-write volume. every 5 minutes a cronjob runs to sync that read-write volume to read-only replicas. this can take upwards of a few minutes (and occasionally much longer if queued up behind a particularly large tarballs site update or something), but generally within 10 minutes after completion of the promote job you should see it reflected	14:25
fungi	on the website	14:25
yoctozepto	fungi: thanks, that's really helpful; I will report if it ever takes longer than one hour for these to appear	14:26
fungi	thanks	14:28
yoctozepto	I have a related question too; does some process try to reconcile failures such as: https://zuul.opendev.org/t/openstack/build/d0cbe655424e4b1995fdcd39eae43221 or is it only best-effort under the assumption that some later promote happens and succeeds?	14:30
fungi	yoctozepto: it's assumed that a subsequent release notes publication job will succeed later and include the earlier content. also while those jobs are triggered on each branch, they all build and upload notes for all branches so as long as the job succeeds later on some branch the content should be incorporated	14:34
fungi	as for that exact failure, i wonder if there's a problem with the afs driver on one of the executors... looking into it now	14:36
yoctozepto	fungi: thanks and thanks	14:36
fungi	looks like it ran from ze12 according to the inventory	14:36
fungi	no recent afs errors in its dmesg though	14:37
fungi	oh, i should have looked at https://zuul.opendev.org/t/openstack/build/d0cbe655424e4b1995fdcd39eae43221/console#1/0/28/localhost closely	14:38
fungi	rsync: rename "/afs/.openstack.org/docs/releasenotes/kolla-ansible/.victoria.html.mxBbLh" -> "victoria.html": No such file or directory (2)	14:38
fungi	we sometimes see that when multiple release notes jobs are running at the same time and trying to update the same path	14:39
fungi	and one build delete's the other build's tempfile	14:39
fungi	deletes	14:39
yoctozepto	makes sense; would need a mutex to solve for good	14:40
yoctozepto	anyhow, good to know the details	14:40
fungi	yep, the 803845,1 build for stable/wallaby was running at exactly the same tie	14:40
fungi	time	14:40
fungi	we'd need to do cross-branch mutexes to avoid this, right	14:41
fungi	basically there were updates for the release notes on the stable/ussuri and stable/wallaby branches of the same project trying to rsync --delete at the exact same time	14:42
fungi	so one deleted the other's tempfiles because they weren't expected content	14:42
yoctozepto	can't be truer	14:43
*** ykarel is now known as ykarel\|away		14:48
*** jpena is now known as jpena\|off		15:32
*** rpittau is now known as rpittau\|afk		16:26
zul	Where would I be adding an envlist for this one? https://zuul.opendev.org/t/openstack/build/005f12d679ef4891ae00d87008ca681f	16:59
clarkb	zul: if you go to https://zuul.opendev.org/t/openstack/build/005f12d679ef4891ae00d87008ca681f/console (its the console button on the right side of the link you gave above) you'll see it is the tox role that is failing (its the 'tox:' prefix that gives that away). Then you can look up that role in https://opendev.org/zuul/zuul-jobs to find the args it takes :	17:02
clarkb	https://opendev.org/zuul/zuul-jobs/raw/branch/master/roles/tox/README.rst	17:02
clarkb	looks like you want to set a var called tox_envlist on the job	17:03
zul	thanks	17:03
*** rlandy\|ruck is now known as rlandy\|drappt		17:21
*** rlandy\|drappt is now known as rlandy		19:43
*** rlandy is now known as rlandy\|ruck		19:45

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!